The Escalating Crisis of Secrets Sprawl: AI Accelerates a Record Surge in Hardcoded Credentials

The digital landscape witnessed an unprecedented surge in hardcoded secrets in 2025, accelerating faster than most cybersecurity teams had anticipated. A pivotal report, "The State of Secrets Sprawl 2026" by GitGuardian, reveals a staggering 29 million new hardcoded secrets discovered in 2025 alone. This represents a substantial 34% increase year-over-year, marking the largest single-year jump ever recorded in the proliferation of sensitive credentials across public GitHub repositories. The findings underscore a critical and rapidly worsening security challenge, driven predominantly by the pervasive integration of Artificial Intelligence (AI) into development workflows, coupled with significant vulnerabilities in internal systems and a persistent struggle with effective remediation strategies.

Understanding the Threat: What is Secrets Sprawl?

Secrets sprawl refers to the uncontrolled dissemination and exposure of sensitive credentials—such as API keys, database passwords, cloud access tokens, and cryptographic keys—within an organization’s various digital environments. These "secrets" are often inadvertently embedded directly into source code, configuration files, collaboration tools, or development environments, making them readily accessible to anyone who gains access to these systems. The danger is immense: hardcoded secrets provide attackers with direct, often privileged, access to critical systems and data, bypassing traditional perimeter defenses. A single leaked secret can serve as a potent entry point for sophisticated supply chain attacks, data breaches, intellectual property theft, and financial fraud. For instance, an exposed cloud credential could grant an attacker full control over an organization’s entire cloud infrastructure, leading to catastrophic consequences, including massive data exfiltration, service disruption, and significant financial losses. The convenience of embedding secrets during rapid development cycles, often to streamline operations or accelerate project timelines, frequently overshadows the profound security risks, creating a technical debt that accumulates into an expansive and exploitable attack surface. This practice is often a symptom of insufficient security awareness, inadequate tooling, or a lack of clear policies within development teams.

The State of Secrets Sprawl 2026: 9 Takeaways for CISOs

A Decade of Compounding Risk: The Accelerating Timeline of Leaks

The issue of secrets sprawl is not new; it has been a persistent concern for cybersecurity professionals for years. However, its recent acceleration is particularly alarming. Since 2021, the volume of leaked secrets has escalated by an astonishing 152%. This growth dramatically outpaces the expansion of GitHub’s public developer base, which, while significant, grew by 98% over the same period. This disparity highlights that the problem isn’t merely a function of more developers creating more code; rather, it points to systemic issues exacerbated by evolving development practices and the widespread adoption of new technologies.

The 2025 data, specifically the 34% year-over-year increase and the discovery of 29 million new hardcoded secrets, signals a critical inflection point. This surge indicates that existing security measures, primarily focused on detection, are failing to keep pace with the exponential rate at which secrets are being introduced and exposed across the software development lifecycle. This trend suggests a compounding risk, where older, unaddressed leaks persist while new ones are generated at an ever-increasing rate, forming a vast and complex web of vulnerabilities that skilled threat actors can exploit. The accelerating pace creates a larger window of opportunity for attackers and a more daunting challenge for defenders.

The AI Factor: A Catalyst for Credential Exposure

Perhaps the most striking revelation from the GitGuardian report is the profound impact of Artificial Intelligence on secrets sprawl. In 2025, AI services were directly linked to an 81% increase in leaked secrets compared to the previous year, accounting for a staggering 1,275,105 exposed credentials. This isn’t confined to major AI platforms like OpenAI or Anthropic; the real explosion is occurring within the burgeoning ecosystem of Large Language Model (LLM) infrastructure, which underpins many modern AI applications.

The report specifically identifies several rapidly growing categories of AI-related leaks, demonstrating the breadth of the problem:

Retrieval APIs: Services like Brave Search, critical for augmenting LLMs with real-time data, saw an astronomical 1,255% increase in associated leaked secrets. These keys often grant access to vast swathes of web data or search capabilities.
Orchestration Tools: Platforms designed to manage and sequence complex AI workflows, such as Firecrawl, experienced a surge of 796% in related leaked credentials. These credentials are often highly privileged, allowing control over multiple AI services and data flows.
Managed Backends: Services like Supabase, which provide database and backend services for AI applications, witnessed a 992% rise in exposed credentials. Such leaks can compromise entire AI applications and their underlying data stores.

This dramatic growth underscores a fundamental shift: every new AI integration introduces another "machine identity" that requires credentials to access various services. As developers leverage AI tools for code generation, data processing, and automation, they inadvertently embed or generate keys for these services, often without robust security protocols. The ease of integrating AI agents and services often leads to a lax approach to credential management, where speed to deployment overrides security best practices. This rapid expansion of machine identities directly correlates with an expansion of the attack surface, making "deploying AI safely" an urgent imperative that demands a comprehensive secrets security strategy, integrating security from the very initial design phase of AI-driven projects.

The Illusion of Security: Internal Systems and Collaboration Tools

While public repositories on platforms like GitHub often draw the most attention from security researchers and threat actors due to their accessibility, GitGuardian’s analysis reveals a far more insidious problem festering within internal systems. The report found that internal repositories are six times more likely to contain hardcoded secrets than their public counterparts. Specifically, 32.2% of internal repositories were found to harbor at least one hardcoded secret, compared to just 5.6% of public repositories. Crucially, these aren’t merely "test keys" or low-value credentials; they are often high-value assets such as Continuous Integration/Continuous Deployment (CI/CD) tokens, critical cloud access credentials, and sensitive database passwords. These are the very keys attackers actively seek once they establish an initial foothold within an organization, allowing for lateral movement and privilege escalation. The long-held belief in "security through obscurity"—the idea that internal systems are inherently safer because they are not publicly visible—has proven to be a dangerous fallacy. Security teams are now compelled to treat internal repositories as primary sources of potential leaks, demanding the same, if not greater, scrutiny as public-facing codebases.

Furthermore, the problem extends beyond code repositories into the realm of everyday communication and project management. A significant 28% of all secret incidents in 2025 originated entirely outside source code, surfacing instead in collaboration tools like Slack, Jira, and Confluence. These "non-code" leaks are often more perilous: an alarming 56.7% of secrets exclusively found in collaboration tools were rated as critical, compared to 43.7% for code-only incidents. This disparity arises because teams frequently share credentials during urgent incident response, complex troubleshooting, or new employee onboarding processes, often in an unencrypted or easily accessible format within these communication platforms. Such practices bypass traditional code-scanning tools entirely, leaving a substantial blind spot for security teams focused solely on code. If an organization’s security posture is limited to scanning code, nearly a quarter of its exposure to critical, high-severity credentials is being overlooked, creating significant vectors for insider threats or external attackers who compromise a single employee account.

The Porous Perimeter: Self-Hosted Infrastructure and Containerization

The report also sheds light on the often-underestimated risks associated with self-hosted infrastructure and container environments. In 2025, GitGuardian uncovered thousands of inadvertently exposed self-hosted GitLab instances and Docker registries. A scan of these systems revealed over 80,000 credentials, with a concerning 10,000 of these still valid and exploitable. This indicates that misconfigurations or accidental public exposure of these internal systems are rampant, turning what should be private infrastructure into public-facing attack surfaces.

Secrets found within Docker images presented a particularly troubling scenario. The research indicated that 18% of scanned Docker images contained secrets, and a significant 15% of these were confirmed as valid. This compares to GitLab repositories, where 12% contained secrets and 12% were valid. The higher validity rate and the inherent nature of Docker images, which are frequently used in production environments, mean that these exposed secrets are often "production-adjacent." They can grant immediate access to live applications, databases, and critical services, making them prime targets for attackers. This highlights that the traditional perimeter between private and public infrastructure is increasingly porous, with critical credentials leaking from self-managed systems into accessible domains, effectively dissolving conventional security boundaries. The proliferation of microservices and containerized applications, while offering agility, also introduces new complexities in secret management that many organizations are yet to adequately address.

The Remediation Crisis: A Lingering Threat

Perhaps one of the most sobering findings in the report is the persistent failure of organizations to remediate leaked secrets effectively. Detection, it seems, is only half the battle, and often the easier half. GitGuardian’s retesting of secrets confirmed as valid in 2022 revealed that a staggering 64% of them remained exploitable four years later. This is not an anomaly or a rounding error; it is concrete proof that the essential processes of credential rotation and revocation are neither routine, consistently owned, nor adequately automated within most organizations. This systemic failure to address detected vulnerabilities allows attackers durable access.

The challenge of remediation is multi-faceted and often deeply entrenched in organizational inertia. Credentials deeply embedded across complex build systems, CI variables, container images, and numerous vendor integrations are notoriously difficult to replace without risking production outages or breaking critical business processes. For many development and operations teams, the perceived safest short-term choice is often inaction, inadvertently leaving attackers with durable and long-term access paths to critical systems. This creates a cumulative security debt, where old vulnerabilities continue to pose a threat alongside newly emerging ones, significantly amplifying the overall risk profile of an organization. The lack of robust, automated remediation workflows transforms detected leaks into persistent backdoors for malicious actors, effectively giving them a stable foothold.

Developer Endpoints: New Aggregation Layers for Attackers

The modern developer workstation and CI/CD runner have evolved into critical aggregation layers for sensitive credentials, a trend starkly underscored by recent supply chain attacks. The "Shai-Hulud 2" supply chain attack, for instance, offered researchers a rare glimpse into the types and prevalence of secrets found on compromised developer machines. Across 6,943 systems, GitGuardian identified an astonishing 294,842 secret occurrences, corresponding to 33,185 unique secrets. On average, each live secret was found in eight different locations on the same machine, scattered across .env files, shell history, Integrated Development Environment (IDE) configurations, cached tokens, and build artifacts. This illustrates how easily credentials can proliferate and reside in multiple, often overlooked, locations on a single machine, creating a treasure trove for attackers.

Even more concerning, 59% of the compromised machines in the Shai-Hulud 2 attack were identified as CI/CD runners, not personal laptops. This shift is crucial: when secrets sprawl into build infrastructure, the problem transcends individual developer hygiene and becomes an organizational exposure issue, affecting the entire software supply chain. The subsequent "LiteLLM supply chain attack" further reinforced this pattern, demonstrating how compromised packages can harvest SSH keys, cloud credentials, and API tokens directly from developer machines—especially those increasingly used for AI development, where a concentration of sensitive tools and tokens is common. These incidents highlight the critical need to secure developer environments and build infrastructure as prime targets for attackers, emphasizing the need for robust endpoint detection and response, alongside proactive secret scanning.

Emerging Attack Surfaces: Model Context Protocol (MCP) Servers

The rapid evolution of AI systems has introduced entirely new vectors for credential exposure, creating novel attack surfaces that security teams are still struggling to comprehend and defend. Model Context Protocol (MCP), designed to enhance the utility of AI systems by connecting them to diverse tools and data sources, inadvertently created a new class of vulnerability. In its inaugural year, 2025, GitGuardian discovered over 24,008 unique secrets within MCP-related configuration files on public GitHub, with 2,117 of these verified as valid and exploitable.

As the adoption of agentic AI accelerates, frameworks like MCP are normalizing the practice of embedding credentials into configuration files, startup flags, and local JSON files to facilitate seamless integration and operation. The rapid expansion of the AI agent ecosystem is currently outpacing the development and implementation of adequate security controls, leaving a wide-open window for credential theft and abuse. This mirrors the early days of cloud adoption, where rapid innovation outpaced security best practices, leading to widespread misconfigurations and vulnerabilities. Without proactive measures, MCP and similar emerging AI frameworks will continue to be a significant source of critical secret leaks.

A Call for Transformation: From Detection to Non-Human Identity Governance

The persistent and accelerating nature of secrets sprawl, coupled with the emerging threats from AI and internal systems, demands a fundamental shift in cybersecurity strategy. The industry’s current limiting factor lies in its inability to answer three critical questions at scale:

What non-human identities exist within my environment?
Who owns them?
What can they access?

Organizations embracing agentic AI and modern development practices must move beyond reactive secrets detection to proactive, continuous Non-Human Identity (NHI) governance. This paradigm shift involves several strategic imperatives:

Eliminating Long-Lived Static Credentials: Wherever possible, replace static, long-lived credentials with ephemeral, dynamically generated ones that expire after a short period. This significantly reduces the window of opportunity for attackers to exploit them.
Adopting Short-Lived, Identity-Driven Access: Implement systems that grant access based on verifiable identity and only for the duration required, leveraging principles of least privilege and just-in-time access. This ensures that even if a secret is compromised, its utility to an attacker is severely limited.
Secrets Vaulting as Default: Establish secrets vaulting as the standard developer workflow, ensuring all sensitive credentials are securely stored and accessed programmatically, never hardcoded in plain text. Tools like HashiCorp Vault or AWS Secrets Manager should be integrated into CI/CD pipelines.
Comprehensive Lifecycle Management: Treat every service account, CI job, and AI agent as a governed identity, subject to robust lifecycle management from creation to deprecation, including regular rotation, immediate revocation upon compromise, and clear ownership.

Security experts across the industry are increasingly advocating for this comprehensive approach. Dr. Anya Sharma, a leading cybersecurity analyst specializing in identity and access management, recently stated, "The era of perimeter-based security is over, and the focus must now shift inwards, securing every identity, human or machine, and every credential, throughout its lifecycle. Without robust NHI governance, organizations are simply playing a losing game of whack-a-mole,

Leave a Reply Cancel reply