Neel Sundaresan doesn’t answer three questions. One of them, he says with some amusement, is why IBM Bob is named Bob.

This particular deflection is telling. Sundaresan, GM of Automation and AI at IBM Software, founding engineer of Microsoft GitHub Copilot, and a former researcher at IBM, is not a product marketing executive. He is a researcher who evolved into a builder, and subsequently an executive. The consistent thread through all these roles is a singular obsession: what is required to enhance software developer productivity, and what are the persistent obstacles? This question has occupied his professional life since 2000, predating the widespread recognition of transformers, large language models, and the notion that AI and developer tooling could be integrated. The trajectory from those early days to the development of IBM Bob, an agentic development tool announced this week and already utilized by 80,000 IBM employees, is a narrative far richer than a brief press release can convey.

The Genesis of Developer Productivity Tools: Early Innovations

Sundaresan’s initial foray into developer productivity systems was far removed from the sophisticated AI coding assistants prevalent today. It began with a recommender system for API calls. "Thirty percent of developer code is API calls," he explained in an extensive interview with The New Stack. "When you type a class name followed by a dot, you are presented with a long list of functions to choose from. This itself represents a point of friction." The objective was not to generate code, but to intelligently surface the most relevant function call at the precise moment it was needed – essentially, applying a search ranking paradigm to a developer’s autocomplete experience.

The underlying model was not a transformer, nor was it a deep learning model in the contemporary sense. Nevertheless, developers found it highly beneficial. This early insight – that reducing friction in a specific, minor aspect of the development workflow could yield disproportionately high user satisfaction – has profoundly influenced Sundaresan’s approach to the problem. "Coding is an analytical task, distinct from online shopping," he posited. "If the system offers an incorrect recommendation, or if a suggestion disrupts my thought process, that carries significant weight." He argues that user experience is independent of the underlying AI technology; a superior model can still result in a suboptimal product if the user interface is poorly designed.

He closely observed the evolution of the model landscape, from Long Short-Term Memory (LSTM) models and early encoder-decoder architectures to the pivotal Google transformer paper and the initial GPT releases. At each stage, his team had already identified the problems these advancements aimed to solve, but the models lacked the requisite power. "If you review our publications from that era, you’ll find research on all these developments," Sundaresan stated. "Each paper would present a model for a specific application."

During his tenure at Microsoft, Sundaresan and his team encountered significant constraints when developing tools like GitHub Copilot. "Even our customers were hesitant to send their data to our cloud infrastructure. They preferred to keep the data on their local machines. Consequently, we had to engineer the models to run directly on laptops, which involved considerable technical effort to ensure performance within those limited environments." This experience underscored the critical importance of on-device processing and data privacy for enterprise adoption.

The IBM Advantage: Leveraging Legacy and Scale

The question naturally arises: why did Sundaresan bring his extensive experience to IBM, a company not typically associated with the cutting edge of AI startups? He attributes this decision to a desire for change after a decade at Microsoft, coupled with a compelling proposition from IBM. However, a less obvious but significant factor is that IBM’s perceived liabilities are, in fact, assets for his specific mission.

"Within IBM Software, we have nearly 20,000 employees, supported by robust infrastructure and consulting services," Sundaresan elaborated. "The potential to create something that could benefit this large internal user base is, in itself, a substantial product opportunity." This internal deployment, referred to by IBM as "client zero," provided a massive, diverse, and captive user base. These users were willing to tolerate early-stage imperfections in exchange for tangible productivity enhancements – an advantage that no external product launch could replicate.

“Like taking your Ferrari to buy milk”: IBM’s Neel Sundaresan on the case for Bob

Furthermore, IBM’s diverse workload mix presented a unique testing ground. Its internal developers work not only with modern languages like Python and Rust but also with legacy systems such as PL/I, COBOL, and mainframe JCL, alongside what Sundaresan described as "custom languages, akin to slang." If IBM Bob could effectively support this broad spectrum of development environments, it could confidently handle the demands of any enterprise customer. "Before we even approach a client, we have a compelling narrative to share," he remarked.

Sundaresan was explicit about the challenges his new endeavor aimed to address. Rather than building a generic tool for all developers, he focused on a system optimized for enterprise-specific conditions that most AI coding tools treat as edge cases: legacy codebases, stringent compliance requirements, hybrid computing environments, and the significant cost implications of AI-generated code that appears production-ready but is not.

Addressing the Unspoken Cost of AI Development

One of the most candid insights from Sundaresan concerns the typical usage patterns of AI coding tools when left to users’ discretion. "Developers will simply ask, ‘Which model do you want to use?’ and often select the latest high-end model, such as Sonnet 4.7. They might be executing a simple prompt, but the cost could escalate to $40 per million tokens," he stated. He likened this to "taking your Ferrari to buy milk; you don’t need to."

IBM Bob circumvents this issue by not exposing the underlying model selection to users. Instead, it intelligently routes tasks to the most appropriate model based on the task’s actual requirements. This could involve leveraging models from Anthropic (Claude), open-source options like Mistral, IBM’s own Granite models, or proprietary fine-tuned models specifically developed for Bob’s environment. This routing intelligence, Sundaresan believes, represents the core architectural innovation. "It’s not merely about integrating a model into the system," he emphasized. "It involves bringing together the model, the user experience, and the underlying architecture that delivers that experience. The model is only one component of this complex equation."

He described conducting A/B experiments on IBM’s internal user base, comparing different frontier model variants, monitoring usage patterns, and identifying instances where expensive models were used for tasks that cheaper alternatives could handle just as effectively. This internal deployment facilitated experimentation at a scale that would be prohibitive for an early-stage external product.

The Evolving Landscape of Agentic AI

When discussing the prevailing hype surrounding agentic AI, Sundaresan offers a perspective rooted in research rather than pure executive enthusiasm. "There’s no smoke without fire," he told The New Stack. "If hype is the smoke, there is a fire somewhere, even if it’s not as large as the smoke suggests. But the fire is real." His analysis is that agent-based development is indeed a significant trend, but its conceptual underpinnings are not entirely new. Service-based development, API-based development, and agent-based development all existed prior. What has fundamentally changed is the interface, which has transitioned from deterministic and programmatic to probabilistic and conversational. This shift introduces genuine new capabilities alongside novel risks.

"You can also distract these systems," Sundaresan warned regarding agent systems. "You can ask questions that are inappropriate, or elicit information that should not be revealed." He argues that the frequently cited 91% AI project failure rate stems from a lack of discipline. Companies often assume that simply contracting with a frontier model provider is sufficient, which is not the case. "You need to apply the same discipline you’ve always used before incorporating these technologies into your software products," he stressed.

The future direction that Sundaresan believes warrants greater attention is the development of agents that can communicate with each other, potentially in machine-native languages that are not directly interpretable by humans. "If errors occur in those derivative languages, the consequences could be exponential," he cautioned. "Many developments lie ahead. We can either succumb to fear and inaction, or we can proceed bravely yet systematically." This measured approach emphasizes the need for careful planning and robust governance in the face of rapidly advancing AI capabilities.

The Genesis of Developer Productivity Tools: Early Innovations

The IBM Advantage: Leveraging Legacy and Scale

Addressing the Unspoken Cost of AI Development

The Evolving Landscape of Agentic AI

Leave a Reply Cancel reply