Critical Vulnerabilities 'Bleeding Llama' and Persistent Code Execution Flaws Expose Over 300,000 Ollama Servers to Remote Attacks

Cybersecurity researchers have unveiled a pair of critical security vulnerabilities affecting Ollama, a widely-adopted open-source framework for running large language models (LLMs) locally. The first, dubbed "Bleeding Llama" (CVE-2026-7482), is an out-of-bounds read flaw that could allow remote, unauthenticated attackers to leak an entire process’s memory. The second set of vulnerabilities involves two unpatched flaws in Ollama’s Windows auto-update mechanism, which can be chained to achieve persistent code execution. These disclosures collectively underscore significant security risks for potentially hundreds of thousands of servers globally, necessitating urgent attention from users and organizations leveraging the platform for their AI inference needs.

The ‘Bleeding Llama’ Vulnerability (CVE-2026-7482): A Critical Memory Leak

At the forefront of these revelations is CVE-2026-7482, an out-of-bounds read vulnerability assigned a critical CVSS score of 9.1. Discovered and codenamed "Bleeding Llama" by security firm Cyera, this flaw is estimated to impact over 300,000 Ollama servers worldwide. Its severity stems from the potential for a remote, unauthenticated attacker to extract highly sensitive data directly from the affected server’s process memory.

The core of the "Bleeding Llama" vulnerability lies within Ollama’s GGUF model loader, specifically in versions prior to 0.17.1. GGUF, or GPT-Generated Unified Format, is a specialized file format designed for storing and efficiently loading large language models for local execution. The vulnerability manifests when Ollama processes a malformed GGUF file through its /api/create endpoint. According to the CVE.org description, an attacker can craft a GGUF file where the declared tensor offset and size exceed the file’s actual length. During the quantization process, handled by functions like WriteTo() in fs/ggml/gguf.go and server/quantization.go, the server attempts to read past the allocated heap buffer, leading to an out-of-bounds read.

This technical misstep is exacerbated by Ollama’s utilization of the unsafe package in Go, the programming language in which Ollama is primarily developed. The unsafe package allows Go programs to perform operations that bypass the language’s inherent memory safety guarantees. While this can be used for performance optimizations in specific scenarios, its misuse or vulnerabilities within code employing it can lead to severe security flaws like "Bleeding Llama." In this context, the unsafe package facilitates direct memory manipulation without the typical bounds checking, making the out-of-bounds read possible when processing the maliciously crafted GGUF file.

The Peril of Data Exposure

The implications of successful exploitation are profound. A malicious actor could send a specially designed GGUF file to an exposed Ollama server, setting a tensor’s shape to an extraordinarily large number. This action triggers the heap out-of-bounds read during the model creation process. The data subsequently leaked from the Ollama process memory can encompass a wide array of confidential information. This includes, but is not limited to, environment variables, critical API keys that could grant access to other systems, sensitive system prompts used to configure LLMs, and even private conversation data from concurrent users interacting with the AI models.

Cyera security researcher Dor Attias elaborated on the gravity of this data exposure: "An attacker can learn basically anything about the organization from your AI inference – API keys, proprietary code, customer contracts, and much more." Attias further highlighted that the risk escalates significantly when Ollama is integrated with other tools, such as "Claude Code." In such scenarios, all tool outputs flow through the Ollama server, are saved in the heap, and thus become susceptible to being intercepted and exfiltrated by an attacker. The exploitation chain, as outlined by Cyera, typically involves three steps: crafting the malicious GGUF file, uploading it via the /api/create endpoint to trigger the memory leak, and then exfiltrating the resulting model artifact, which now contains the leaked data, through the /api/push endpoint to an attacker-controlled registry. This complete chain allows for both data extraction and its subsequent transfer outside the compromised environment.

Ollama Out-of-Bounds Read Vulnerability Allows Remote Process Memory Leak

Mitigation Strategies for Bleeding Llama

Given the critical nature of CVE-2026-7482, immediate action is paramount for all Ollama users. The primary recommendation is to update Ollama instances to version 0.17.1 or newer, as this release contains the necessary patches to address the vulnerability. Beyond patching, several best practices for network security are strongly advised. Organizations should limit network access to Ollama servers, ensuring they are not directly exposed to the internet. Auditing running instances for any unintended internet exposure is crucial. Isolating and securing Ollama deployments behind a robust firewall can significantly reduce the attack surface. Furthermore, because Ollama’s REST API does not provide built-in authentication, it is highly recommended to deploy an authentication proxy or an API gateway in front of all Ollama instances. This additional layer can enforce access control and prevent unauthorized interactions with the API endpoints, even if other vulnerabilities were to emerge.

Unpatched Flaws in Ollama’s Windows Auto-Update Mechanism

Compounding the security concerns for Ollama users are two distinct, yet chainable, vulnerabilities identified by researchers at Striga in the framework’s Windows update mechanism. These flaws, publicly disclosed following the elapse of a 90-day responsible disclosure period (after initial reporting on January 27, 2026), remain unpatched and pose a threat of persistent code execution on affected Windows systems.

Bartłomiej "Bartek" Dmitruk, co-founder of Striga, detailed how the Windows desktop client for Ollama operates. It is configured to auto-start upon user login from the Windows Startup folder, listens for connections on 127.0.0.1:11434, and periodically polls for updates in the background via the /api/update endpoint. Any pending updates are then scheduled to run on the next application start. The identified vulnerabilities leverage this update process:

Path Traversal Vulnerability: This flaw allows an attacker to manipulate file paths within the update mechanism.
Missing Signature Check: Critically, the update process fails to adequately verify the digital signatures of update files, allowing unsigned or malicious executables to be introduced.

The Mechanics of the Attack Chain

To exploit these flaws, an attacker needs to be able to control an update server that the victim’s Ollama client can reach. In a realistic scenario, this might involve overriding the OLLAMA_UPDATE_URL environment variable to point the client to an attacker-controlled local server, typically over plain HTTP to simplify the attack. The attack chain also relies on the AutoUpdateEnabled setting being active, which is the default configuration for Ollama on Windows.

Once an attacker controls the update response, they can craft a malicious update package. By exploiting the path traversal vulnerability, the attacker can redirect the arbitrary executable supplied as part of the "update" process to be written outside of the usual, temporary staging directory. Specifically, they can direct it to the Windows Startup folder (%APPDATA%MicrosoftWindowsStart MenuProgramsStartup). The missing signature verification flaw then ensures that this attacker-supplied executable is accepted and written to the Startup folder without raising any alarms or being flagged as illegitimate.

Impact and Persistence

The consequence of this chained exploitation is persistent, silent code execution at the privilege level of the user running Ollama. Each time the victim logs into their Windows machine, the operating system will execute the malicious binary dropped by the attacker from the Startup folder. This grants the attacker a persistent foothold, enabling them to deploy a variety of payloads, including reverse shells for remote access, info-stealers designed to exfiltrate sensitive data like browser secrets and SSH keys, or droppers that establish additional, more covert persistence mechanisms. As Dmitruk emphasized, "Anything that runs as the current user" is within the attacker’s capabilities.

It’s important to differentiate the impact of the individual flaws. The missing integrity check alone could lead to code execution if an attacker manages to place a malicious installer in the expected staging directory. However, this execution would likely not be persistent, as the next legitimate update would typically overwrite the staged malicious file. By incorporating the path traversal vulnerability, the attacker gains the crucial ability to write the executable to a location outside the normal update path, such as the Startup folder, thereby achieving true persistence that survives subsequent legitimate updates.

CERT Polska, which managed the coordinated disclosure process for these vulnerabilities, confirmed that Ollama for Windows versions 0.12.10 through 0.17.5 are vulnerable. Dmitruk later clarified that versions up to 0.22.0 are also affected. He reiterated the severity: "The path traversal writes attacker-chosen executables into the Windows Startup folder. The missing signature verification keeps them there: the post-write cleanup that would remove unsigned files on a working updater is a no-op on Windows. On the next login, Windows runs whatever was left behind."

Interim Recommendations for Windows Users

Given that these Windows-specific flaws remain unpatched, users of Ollama on Windows versions 0.12.10 through 0.22.0 must take immediate interim measures to protect themselves. The most critical recommendation is to turn off automatic updates for Ollama. Additionally, users should manually inspect and remove any existing Ollama shortcut from the Windows Startup folder (%APPDATA%MicrosoftWindowsStart MenuProgramsStartup). This action specifically disables the silent, on-login execution pathway that the vulnerabilities exploit. While removing the dropped binary from the Startup folder ends the immediate persistence, it is crucial to understand that "the underlying flaws remain," as Dmitruk noted, meaning that a legitimate patch from Ollama developers is still required for a complete resolution.

Ollama’s Role in the Local AI Ecosystem

Ollama has rapidly gained traction as a pivotal tool in the burgeoning field of local AI development and deployment. With over 171,000 stars and 16,100 forks on GitHub, it stands as a testament to the community’s desire to run large language models on personal hardware rather than relying solely on cloud-based services. This framework democratizes access to powerful AI models, allowing developers, researchers, and hobbyists to experiment with LLMs, fine-tune them, and integrate them into applications without incurring significant cloud computing costs or concerns about data privacy on third-party servers.

The appeal of local LLM execution is manifold: enhanced privacy for sensitive data, reduced latency, offline capabilities, and greater control over the computational environment. Ollama simplifies this complex process, abstracting away much of the underlying technical hurdles associated with model loading, quantization, and inference. Its popularity has made it a foundational component for many local AI projects, making any security vulnerabilities in its core functionality or update mechanisms particularly concerning due to its widespread adoption and the sensitive nature of the data it often handles.

Broader Implications for AI Security and Enterprise Risk

The disclosure of these two sets of critical vulnerabilities in Ollama carries significant broader implications for the security of AI infrastructure, particularly within enterprise environments. The "Bleeding Llama" memory leak highlights a fundamental risk in frameworks that process untrusted inputs, especially when dealing with complex data structures like GGUF files and utilizing low-level memory operations. For organizations, the potential exposure of API keys, proprietary code, customer contracts, and internal conversation data represents a catastrophic data breach scenario. If an LLM is used for internal knowledge management, code generation, or customer service, the leaked data could provide attackers with blueprints of internal operations, intellectual property, or personally identifiable information.

The Windows auto-update vulnerabilities, while requiring a more specific attack vector (control over an update server), underscore the critical importance of secure software update mechanisms. Software supply chain attacks, where adversaries compromise the update process to distribute malware, are a growing threat. These Ollama flaws serve as a stark reminder that even widely trusted open-source tools can harbor weaknesses in their deployment and maintenance routines. For enterprises deploying Ollama on Windows workstations or servers, this could lead to widespread, persistent compromise of user systems, potentially serving as a pivot point for lateral movement within the corporate network.

These incidents collectively emphasize that as AI technologies become more integrated into business workflows, the attack surface expands. The security of the underlying platforms, like Ollama, becomes as crucial as the security of traditional IT infrastructure. Organizations must adopt a comprehensive security posture that includes not only patching known vulnerabilities but also implementing robust network segmentation, access controls, threat monitoring, and a thorough understanding of the security implications of every component in their AI stack.

The Importance of Proactive Cybersecurity

The vulnerabilities in Ollama serve as a powerful reminder that no software, regardless of its popularity or open-source nature, is immune to security flaws. For users and organizations, proactive cybersecurity measures are no longer optional but essential. This includes diligently applying the latest security patches as soon as they become available, conducting regular security audits of all deployed systems, and adopting a "assume breach" mentality to implement layered defenses.

For the open-source community, these discoveries highlight the continuous need for rigorous code review, security testing, and adherence to secure coding practices, especially when dealing with memory-sensitive operations or update mechanisms. The rapid pace of AI development must be matched by an equally robust commitment to security from the ground up.

Conclusion: Ongoing Vigilance in the Age of AI

The "Bleeding Llama" memory leak and the persistent code execution flaws in Ollama’s Windows update mechanism represent significant threats to the integrity and confidentiality of data processed by local LLMs. With hundreds of thousands of servers potentially exposed, the cybersecurity community and Ollama users face an immediate challenge. While patches are available for the "Bleeding Llama" vulnerability, the Windows update flaws remain unpatched, requiring users to implement interim manual mitigation steps. As AI continues to revolutionize industries and integrate into daily operations, the vigilance, rapid response, and collaborative efforts of developers, security researchers, and end-users will be paramount in securing this transformative technology against evolving cyber threats. The era of AI demands an unprecedented level of cybersecurity awareness and action to safeguard the promise of artificial intelligence.

Critical Vulnerabilities ‘Bleeding Llama’ and Persistent Code Execution Flaws Expose Over 300,000 Ollama Servers to Remote Attacks

Leave a Reply Cancel reply