Nvidia Unveils Nemotron 3 Ultra: America's Smartest Open AI Model Faces Down Global Competition

Taipei, Taiwan – Nvidia CEO Jensen Huang took the Computex stage on Sunday, a familiar leather jacket adorning his shoulders, to introduce Nemotron 3 Ultra, the company’s most expansive open artificial intelligence model to date. This release marks a significant moment for American AI development, positioning Nemotron 3 Ultra as the leading open-weight model developed within the United States. However, while its capabilities are undeniably impressive, early assessments suggest it still trails behind the most advanced models originating from China, highlighting the intensifying global race in AI innovation.

The newly unveiled Nemotron 3 Ultra boasts a formidable architecture, featuring approximately 550 billion total parameters. Crucially, its innovative "mixture-of-experts" (MoE) design ensures that only around 55 billion active parameters are utilized at any given moment. Parameters are fundamental to an AI model’s capacity to learn and understand, with a higher number generally correlating to enhanced intelligence and breadth of knowledge. The MoE approach functions akin to a highly specialized medical facility: when a patient presents with a specific ailment, only the relevant medical experts are summoned, rather than the entire hospital staff. This efficiency drastically reduces computational overhead and operational costs, allowing Nvidia to claim performance metrics that outshine comparable open-weight alternatives. Specifically, the company reports up to five times faster inference speeds and a 30% reduction in operational costs compared to models of similar caliber.

To gauge the model’s performance, Nvidia collaborated with independent evaluator Artificial Analysis, a firm that partnered with Nvidia for a pre-release assessment. Their findings, published on the Artificial Analysis website, place Nemotron 3 Ultra at a score of 48 on their proprietary Intelligence Index. This comprehensive benchmark aggregates the results of ten distinct evaluations, covering critical areas such as reasoning, coding proficiency, general knowledge recall, and agentic capabilities. The index operates on a numerical scale where higher scores signify superior intelligence.

This score firmly establishes Nemotron 3 Ultra as the frontrunner among U.S.-developed open-weight models. For comparative context, other leading American open-weight contenders include Google’s Gemma 4 31B, which achieved a score of 39, Nvidia’s own Nemotron 3 Super at 36, and OpenAI’s gpt-oss-120b, which registered a score of 33. The margin by which Nemotron 3 Ultra surpasses its domestic rivals underscores Nvidia’s focused investment in advancing open-source AI.

The leap in performance from Nemotron 3 Ultra to its predecessor, Nemotron 3 Super, is particularly noteworthy. Released in March 2026 with 120 billion parameters, Nemotron 3 Super was already recognized as a robust open model adept at handling autonomous agent tasks. The Ultra version’s jump of 12 index points is a substantial advancement within the competitive landscape of AI benchmarking, indicating a significant architectural and training improvement.

The Nemotron Family: A Strategic Evolution in Open AI

Nvidia’s commitment to the open AI model ecosystem is a strategic initiative that has been unfolding for several years. The first Nemotron-branded model was introduced in November 2023, with the third generation, encompassing the Ultra variant, announced in December 2025. This family of models is designed to cater to a spectrum of AI needs, ranging from resource-constrained applications to highly demanding computational tasks.

The Nemotron family comprises three distinct tiers:

Nano: Optimized for lightweight applications where computational resources are limited.
Super: Designed for mid-range enterprise applications requiring a balance of performance and efficiency.
Ultra: Engineered for complex reasoning workloads demanding the highest levels of intelligence and capability.

All models within the Nemotron 3 generation share a common hybrid architecture. This sophisticated design integrates Mamba-2 layers, a novel alternative to traditional Transformer attention mechanisms, with standard Transformer attention and the efficient mixture-of-experts routing.

The inclusion of Mamba-2 is particularly significant. This architecture processes long sequences of data at a fraction of the computational cost associated with conventional attention mechanisms. This efficiency is paramount for models intended to handle extremely large context windows. Nemotron 3 Ultra, for instance, supports a remarkable 1-million-token context window. This capability theoretically allows an AI agent to simultaneously process an entire extensive codebase or review hundreds of research documents without losing contextual coherence. Such a feature is transformative for complex analytical tasks and knowledge synthesis.

Furthermore, the Ultra model incorporates a technique known as multi-token prediction (MTP). Unlike traditional models that generate output one token at a time, MTP enables the model to predict several future tokens concurrently. This parallel processing significantly accelerates the generation of text and other forms of output, enhancing real-world application responsiveness.

All three Nemotron 3 models have undergone extensive post-training using reinforcement learning across a variety of interactive environments. This rigorous training methodology imparts a crucial ability to plan and execute multi-step tasks, moving beyond simple question-answering to more sophisticated problem-solving. Nvidia detailed these techniques, tools, and data in a developer blog post, emphasizing the efficiency and accuracy achieved.

The weights for the Nemotron 3 Ultra model are publicly available, and its training methodologies are also being released. While running a model of this magnitude (550 billion parameters) typically necessitates substantial computational power, akin to that found in data centers, accessibility is facilitated through Nvidia’s API and various cloud providers. This model mirrors the deployment strategy of other leading AI services like GPT and Claude, allowing users to leverage its power without direct hardware ownership.

A Tale of Two Metrics: Speed vs. Intelligence

The speed at which Nemotron 3 Ultra can generate output is a clear differentiator. In pre-release testing on a DeepInfra endpoint, the model demonstrated the capacity to deliver over 300 output tokens per second. This performance metric stands in stark contrast to leading Chinese models in a comparable intelligence bracket, such as DeepSeek V4 Pro and Kimi K2.6. These models, when accessed through their commercial APIs, typically operate within a range of 50 to 100 tokens per second. For real-world applications, particularly autonomous agents engaged in lengthy, multi-step tasks, this speed advantage is not merely incremental; it can significantly compound efficiency gains and reduce operational delays.

However, raw speed, while impressive, does not solely determine the overall competitive standing of an AI model. The intelligence contest is more nuanced, as clearly illustrated by the chart published by Artificial Analysis. On the vertical axis, representing intelligence, Nemotron 3 Ultra is positioned at 48. While this is a strong showing for a U.S.-developed open-weight model, it falls short of China’s Kimi K2.6 from Moonshot AI, which achieved a score of 54. This six-point gap on the Intelligence Index represents a meaningful difference in capability. Kimi K2.6, released in April 2026, currently holds the fourth position globally among all AI models, including proprietary ones. It trails only slightly behind the leading proprietary models from Anthropic, Google, and OpenAI, which are collectively tied at a score of 57.

The Geopolitical Landscape of Open AI

The disparity observed in the open-weight AI sector is not a new phenomenon. For an extended period, Chinese research labs have been actively contributing a substantial volume of highly capable models to the open-source community. Concurrently, major American AI companies like OpenAI, Anthropic, and Google have largely maintained their most advanced systems behind closed APIs. This strategic divergence has contributed to a significant shift in global AI model usage.

As reported by Decrypt in March 2026, the share of global open-model usage attributed to Chinese open-source models surged dramatically. From a modest approximately 1.2% of global open-model usage in late 2024, this figure climbed to around 30% by the end of 2025. Nvidia stands as a prominent American entity actively seeking to reverse this trend. The company has publicly committed to a five-year plan with a substantial investment of $26 billion dedicated to the development of open-weight AI.

Nemotron 3 Ultra represents the most visible outcome of this significant investment to date. Nvidia has also announced that development is already underway for its successor, Nemotron 4. This next-generation model is being co-developed through the Nemotron Coalition, a consortium established in March 2026. The coalition comprises eight leading AI labs, including prominent names like Mistral AI and Perplexity. Their collaborative efforts aim to push the boundaries of open frontier models, leveraging Nvidia’s DGX Cloud infrastructure.

The official release of Nemotron 3 Ultra is scheduled for June 4. This release is anticipated to be a pivotal moment, not only for Nvidia’s open-source AI ambitions but also for the broader American AI development landscape, as it seeks to regain ground in a rapidly evolving global technological race. The implications of this competitive dynamic extend beyond mere technological advancement, touching upon economic competitiveness, national security, and the future direction of AI research and deployment worldwide. The sustained investment and strategic focus by companies like Nvidia are crucial for fostering a robust and competitive open AI ecosystem, ensuring that the benefits of this transformative technology are broadly accessible and collaboratively developed.

Nvidia Unveils Nemotron 3 Ultra: America’s Smartest Open AI Model Faces Down Global Competition

The Nemotron Family: A Strategic Evolution in Open AI

A Tale of Two Metrics: Speed vs. Intelligence

The Geopolitical Landscape of Open AI

Leave a Reply Cancel reply