Why Genpact is warning about AI driving the mass-generation of technical debt

Rajesh Padmakumaran, Vice President of AI Transformation and Services at Genpact, argues that the industry is at a critical juncture. With years of experience in customer application development and maintenance—specifically within Java and legacy modernization—Padmakumaran has observed a fundamental shift in how software is constructed. While Large Language Models (LLMs) like Anthropic’s Claude and OpenAI’s GPT-4 have revolutionized the ability to produce functional code snippets, they have also introduced a new layer of complexity: the risk of unverified intent. Without a rigorous framework to govern AI outputs, the "easy, sub-optimal approach" of today could become the catastrophic legacy system of tomorrow.

The Economic Context of Technical Debt

To understand the gravity of the situation, one must look at the broader economic impact of technical debt. According to a 2023 report by the Consortium for Information & Software Quality (CISQ), the cost of poor software quality in the United States alone has risen to approximately $2.41 trillion annually. Technical debt accounts for a significant portion of this figure, as engineers spend an estimated 33% to 42% of their time dealing with legacy issues rather than developing new features.

The introduction of Generative AI (GenAI) into this ecosystem acts as a double-edged sword. While GitHub’s "Octoverse" report suggests that AI coding assistants can improve developer productivity by up to 55%, the volume of code being produced is increasing exponentially. If even a small fraction of this AI-generated code is architecturally flawed or lacks proper documentation, the sheer volume of output could overwhelm maintenance teams within a few years. This "AI-accelerated debt" is what Padmakumaran and other experts are currently working to mitigate through more disciplined engineering practices.

Case Study: Modernizing Legacy Infrastructure with Claude

Genpact’s recent initiatives provide a window into the practical application of AI in high-stakes environments, such as the financial sector. Padmakumaran’s team recently undertook two large-scale modernization projects: a complex database migration and a Cobol-to-.Net migration for a major bank. The bank’s system, originally written in 1991, represented a classic legacy challenge: millions of lines of code, outdated documentation, and a dwindling pool of subject matter experts who understood the original logic.

The team initially experimented with OpenAI’s models via GitHub Copilot but eventually transitioned to Anthropic’s Claude for specific modernization tasks. The primary driver for this shift was Claude’s large context window and its ability to analyze vast "stacks" of legacy code to identify underlying patterns. However, the project revealed a significant technical hurdle: the "hallucination wall."

During a proof of concept (PoC) involving 800 lines of Cobol, the LLM performed admirably, accurately interpreting the code’s function. However, when the scale was increased to 10,000 lines—a common size for enterprise modules—the model began to struggle. At this scale, LLMs frequently make assumptions about variable meanings or "hallucinate" logic that does not exist. Padmakumaran noted that the model’s reliability began to degrade significantly after the 5,000-line mark. This discovery led to a shift in methodology, emphasizing that AI cannot be a standalone solution for legacy migration; it requires human-led "grounding."

Overcoming the Limits of LLM Context and Logic

The solution implemented by Genpact involved a sophisticated pre-processing layer. Rather than feeding raw code into the LLM, subject matter experts (SMEs) were tasked with grouping the code into logical sections and explicitly defining the functionality of each segment. This structured approach ensures that the model is "grounded" in the actual processing requirements of the business logic.

This methodology utilizes a "multi-model" approach:

The Primary Model: (e.g., Claude) analyzes the code and generates the initial modernized script or documentation.
The Evaluator Model: A second LLM is used to judge and score the accuracy of the primary model’s output against the original specs.
The Governance Framework: A reusable set of prompts and pre-processing steps that ensure consistency across the entire dataset.

According to Padmakumaran, this framework handles approximately 20% to 30% of the heavy lifting in a migration project. By applying discipline to the input phase, the team was able to turn a probabilistic tool into a more deterministic one, reducing the likelihood of introducing new technical debt during the modernization process.

The Case for Spec-Driven Development (SDD)

The most significant shift in Padmakumaran’s philosophy is the advocacy for Spec-Driven Development (SDD). In traditional agile environments, specifications are often informal or evolve alongside the code. In an AI-first world, this approach is dangerous. SDD treats specifications as "first-class engineering artifacts"—formal, machine-readable documents that define exactly what a system must do before a single line of code is generated.

The logic behind SDD is that "code is easy, but intent is hard." AI models are excellent at generating syntax, but they cannot guess the nuanced business intent of a developer. By forcing developers to define intent explicitly through deterministic specs, organizations can use the CI/CD (Continuous Integration/Continuous Deployment) pipeline as an enforcement mechanism. In this model, the pipeline does more than just check if the code compiles; it uses automated tests to verify that the AI-generated code honors the original specifications.

This approach addresses the "AI assistant sprawl" that many enterprises are currently experiencing. As every tool—from IDEs to project management software—integrates its own AI assistant, developers are faced with a fragmented landscape of suggestions. Without a centralized "source of truth" in the form of a specification, the code produced by these various tools can become a disjointed mess of different naming conventions and architectural styles.

Geopolitical and Supply Chain Considerations

The choice of AI models is also increasingly influenced by geopolitical factors. Recent discussions regarding the "Trump 2.0" administration’s potential stance on AI have highlighted Claude and other models as potential points of scrutiny in national security supply chains. While some analysts have raised concerns about the origin and safety protocols of various LLMs, Padmakumaran reports that, thus far, enterprise clients have not pushed back against the use of Claude.

Instead, the focus remains on performance and context. Claude’s superior ability to handle long-form context has made it a favorite for engineering teams who need to "read" entire repositories at once. However, the Genpact framework is designed to be model-agnostic. Whether a client prefers OpenAI, Anthropic, or an open-source alternative like Llama, the underlying discipline of Spec-Driven Development remains the same.

Chronology of the AI Coding Evolution

The transition to the current state of AI-assisted engineering has been rapid:

2021: The launch of GitHub Copilot introduces the mass market to AI-powered autocomplete, primarily focused on "boilerplate" code.
2022: The release of ChatGPT demonstrates that LLMs can explain complex logic and debug code, leading to widespread developer adoption.
2023: Enterprises begin moving from individual developer use to organizational integration, raising concerns about security, IP, and technical debt.
2024: The emergence of "Agentic AI" and tools like Claude Code allows for more autonomous coding, necessitating the rise of frameworks like SDD to maintain human oversight.

Implications for the Future of Engineering

The broader implication of this shift is a fundamental change in the role of the software engineer. The value of a developer is moving away from the ability to write syntax and toward the ability to architect systems and define intent. If AI can mass-produce code, the human’s role is to ensure that the code is worth producing.

Furthermore, the "governance layer" for tech products must now be grounded in the same repositories where the code lives. By using LLMs to check the pipeline itself—much like a traditional governance mechanism—companies can create a self-correcting ecosystem. This involves standardizing naming conventions, enforcing architectural patterns, and ensuring that every variable is defined according to a master spec.

As Padmakumaran emphasizes, the risk of unchecked AI is not just a theoretical policy concern; it is a practical engineering threat. The "legwork" of defining standards and maintaining discipline cannot be outsourced to the machine. If developers fail to follow rigorous CI/CD validation and standardized conventions today, they are simply building a "big maintenance issue for the future."

In conclusion, while the speed of AI-generated code is an undeniable asset, it must be tempered by a commitment to Spec-Driven Development. By treating intent as the primary artifact and code as a secondary byproduct, engineering teams can harness the power of GenAI without falling into the trap of insurmountable technical debt. The responsibility remains with the human engineer to define what "good" looks like, ensuring that the systems built today remain viable for the decades to come.