JetBrains Unveils Mellum2: A Powerful Open-Source Model for Agentic AI Infrastructure

JetBrains announced on Monday the open-sourcing of Mellum2, a significant advancement in the field of artificial intelligence designed to bolster the infrastructure layer of agentic AI systems. This 12-billion parameter coding model is engineered for critical tasks such as routing, retrieval pipelines, sub-agent coordination, and private on-premises deployments – areas where existing proprietary models like Anthropic’s Claude Code may not be deployed due to their closed-source nature or operational constraints. The release marks a strategic move by JetBrains to empower developers and enterprises with greater control and flexibility in building sophisticated AI-driven applications.

Mellum2 builds upon the foundation laid by its predecessor, Mellum. The original Mellum, a 4-billion parameter model, was initially introduced by JetBrains in late 2024 as a proprietary code completion tool exclusively for its Integrated Development Environments (IDEs). Its purpose was to enhance the developer experience by providing intelligent, context-aware code suggestions. However, recognizing the growing demand for open and adaptable AI tools, JetBrains made the decision to open-source Mellum in April 2025, making it available on platforms like Hugging Face. Mellum2, in contrast, has been open from its inception, signaling JetBrains’ commitment to fostering an open ecosystem for AI development from the outset.

The evolution from Mellum to Mellum2 reflects a significant expansion in scope and capability. While Mellum was primarily focused on a single, albeit crucial, task – code completion – Mellum2 is designed to address a broader spectrum of challenges that are now central to how engineering teams are implementing AI. This includes the intricate coordination between multiple AI models, the management of workloads for specialized sub-agents, the optimization of context compression within retrieval pipelines, and the ability to run inference on infrastructure that teams can fully control themselves. This shift underscores a move towards more complex and integrated AI architectures, where specialized models play pivotal roles in orchestrating distributed intelligence.

In a detailed blog post co-authored by Nikita Pavlichenko, Staff Research Engineer at JetBrains, and Anton Semenkin, Product Manager, the company described Mellum2 as a "focal model." This designation highlights its strategic positioning as a fast and specialized component, rather than attempting to rival frontier models in terms of sheer breadth of capabilities. The authors emphasized that while frontier models will continue to push the boundaries of AI research, the practical development of AI products necessitates "focal models" – efficient, specialized components adept at handling high-frequency tasks. This specialization ensures that Mellum2 excels within software engineering environments, maintaining a lean and rapid operational profile.

"Frontier models will continue to push the limits, but practical AI products also require focal models: fast, specialized components that handle high-frequency tasks efficiently," Pavlichenko and Semenkin articulated in their joint statement. "This specialization ensures the model excels in software engineering environments while remaining lean and fast." This philosophy directly addresses the need for AI tools that are not only powerful but also practical and efficient for everyday use in development workflows.

Further enhancing its utility, Mellum2 is released with two specialized, post-trained variants alongside the base model. The first is an "instruct" version, engineered to provide direct and concise answers to queries. The second is a "thinking" version, which crucially outputs an explicit reasoning trace before delivering its response. This "thinking" variant is particularly aimed at tackling more complex, multi-step tasks and agentic operations, where understanding the model’s thought process is as important as the final output. This layered approach allows users to select the optimal variant for their specific needs, from rapid information retrieval to detailed problem-solving.

Built for Speed and Scale: Architectural Innovations of Mellum2

Mellum2’s impressive performance is rooted in its adoption of a Mixture-of-Experts (MoE) architecture. This design allows the model to possess a total of 12 billion parameters while activating only approximately 2.5 billion parameters per token. Instead of processing each token through the entire network, the MoE architecture intelligently routes tokens through a selected subset of its 64 experts. This selective processing significantly accelerates inference speeds without compromising the model’s overall capacity to handle complex tasks. This architectural choice is a key enabler for its high-frequency operational capabilities.

In its comprehensive technical report, JetBrains presented benchmarking data for Mellum2, comparing its performance against Alibaba’s Qwen2.5-7B and Qwen3-8B models. The tests were conducted on a single H100 GPU, using input and output sizes that accurately reflect real-world production code completion workloads. In single-request scenarios, Mellum2 demonstrated a performance nearly identical to Qwen2.5-7B, achieving 192 tokens per second compared to Qwen2.5-7B’s 193 tokens per second. However, under concurrent load conditions – a more representative measure of actual production deployments – Mellum2 significantly outperformed its counterparts. It achieved a 21% speed advantage over Qwen2.5-7B and an impressive 79% lead over Qwen3-8B.

The cost-effectiveness of Mellum2’s architecture is another significant advantage. By activating only 2.5 billion parameters per token, the model’s inference behavior more closely resembles that of a 2.5 billion parameter dense model than a conventional 12 billion parameter dense model. This efficiency is particularly relevant for enterprise teams that route high volumes of requests through AI systems daily as an integral part of larger agentic architectures. Reduced computational overhead translates directly into lower operational costs and greater scalability.

On the critical task of function-level code generation, evaluated using the EvalPlus benchmark—a combination of HumanEval+ and MBPP+—Mellum2’s "thinking" variant achieved a score of 78.4%. This performance places it ahead of other models included in the comparison table, including Qwen3.5-9B (71.8%) and the code-specialized Seed-Coder-8B (73.8%). This demonstrates Mellum2’s strong aptitude for generating accurate and functional code, a core requirement for modern software development.

However, JetBrains’ own evaluation results indicate that when the scope of evaluation expands beyond software engineering tasks to encompass broader reasoning and knowledge assessments, such as GPQA Diamond and MMLU-Redux, the Qwen3.5-9B model retains an advantage. This nuanced performance profile is explicitly acknowledged by JetBrains in their technical report. The authors attribute this gap to a deliberate strategic choice in their training data mix, which prioritized code and developer documentation over broad encyclopedic coverage.

"The gap reflects a deliberate tradeoff in our training mix toward code and developer documentation rather than broad encyclopedic coverage," the report stated. This transparency highlights JetBrains’ focused approach to developing Mellum2 as a specialized tool for the software engineering domain, rather than a general-purpose AI model.

The Dependency Argument: Embracing Openness and Control

Perhaps the most compelling argument for Mellum2 lies in what it does not require, particularly concerning external dependencies. Proprietary coding assistants like Anthropic’s Claude Code and OpenAI’s Codex, while capable of running locally on a client machine, typically route inference through their respective company’s cloud APIs. This reliance on external services introduces potential latency, data privacy concerns, and a lack of granular control over the underlying infrastructure.

Companies like Cursor are also exploring proprietary coding model strategies, as seen with the introduction of their Composer 2.5 model. However, these capabilities often remain tied to their specific platforms. Furthermore, Cursor’s recent partnership with SpaceX’s xAI, while potentially powerful, places critical layers of the AI stack—infrastructure and future model development—outside the direct control of their customers.

Mellum2’s release under the Apache 2.0 license, with open weights, directly addresses these concerns. It provides enterprises with a clear option to own and operate their AI inference infrastructure internally. This self-hosting capability is crucial for organizations with stringent data privacy requirements, regulatory compliance needs, or those seeking to optimize performance by managing their own hardware. The adoption of Mellum2 at an enterprise scale will ultimately hinge on the appetite of these organizations for self-hosted AI infrastructure and their willingness to invest in managing it.

JetBrains appears to be making a calculated bet that deployment flexibility, comprehensive operational control, and true ownership of AI infrastructure will become increasingly vital considerations as AI becomes more deeply integrated into software engineering workflows. This is a reasonable and strategic stance in the evolving landscape of AI adoption. However, the ultimate success of this bet will depend on its widespread adoption and proven value at scale within the enterprise sector.

Mellum2 is now accessible on Hugging Face, with its base, instruct, and thinking checkpoints released under the permissive Apache 2.0 license. Accompanying these models is the full technical report, offering in-depth details on the architectural decisions and the intricate training pipeline that underpins Mellum2’s capabilities. This comprehensive release empowers the AI development community to explore, adapt, and build upon this advanced open-source model.

Built for Speed and Scale: Architectural Innovations of Mellum2

The Dependency Argument: Embracing Openness and Control

Leave a Reply Cancel reply