The safety benchmarks enterprise buyers rely on to evaluate AI models are measuring the wrong thing.

This critical finding emerges from recent research conducted by Cisco, which meticulously paired single-turn and multi-turn evaluation methodologies across 15 prominent closed frontier AI models. The cohort included offerings from industry leaders such as OpenAI, Anthropic, Google, Amazon, and xAI. The study’s implications are far-reaching, suggesting that current industry-standard safety evaluations may be providing a misleading picture of AI model resilience against sophisticated adversarial attacks.

The research revealed a stark contrast in how these advanced AI models perform under different testing conditions. Every single model tested demonstrated a non-trivial failure rate when subjected to multi-turn attacks. The success rates of these iterative assaults varied significantly across the entire group, ranging from a low of 7.89% to a staggering high of 88.30%. This broad spectrum of failure rates underscores the inherent vulnerability of even the most advanced AI systems when faced with persistent, multi-faceted probing.

In contrast, single-turn evaluations, which involve a single, one-off interaction with the AI model, presented a narrower range of success rates for attackers, fluctuating between 2.19% and 64.91%. While these numbers might appear more manageable, the Cisco report strongly argues that this methodology fails to capture the true nature of real-world threats.

“Multi-turn evaluation matters for one primary reason: it is where attackers operate,” the report states. “Real adversaries iterate, reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually.” This statement directly challenges the prevailing industry practice, which often leans heavily on single-turn assessments due to their perceived simplicity and speed. The Cisco research posits that this reliance is fundamentally flawed, as it overlooks the adaptive and persistent nature of malicious actors.

Single-Turn Scores Fall Short in Predicting Multi-Turn Resilience

Perhaps the most consequential finding of the Cisco study is the weak correlation between single-turn performance and multi-turn resilience. The research demonstrated that a model’s success in a one-off test is a poor predictor of its ability to withstand a sustained, conversational attack. The disparities observed were substantial, with "cross-regime deltas" – the difference in success rates between single-turn and multi-turn attacks – reaching as high as 55 percentage points in both positive and negative directions.

This phenomenon was vividly illustrated by several prominent models. For instance, Google’s Gemini 3 Pro, which exhibited a seemingly robust single-turn attack success rate (ASR) of 18.10%, experienced a dramatic fourfold increase in failure rates, jumping to 73.35% when subjected to iterative, multi-turn attacks. Similarly, OpenAI’s GPT-5.4, which presented an impressively low single-turn ASR of 2.74%, saw its vulnerability multiply ninefold, reaching 24.68% under multi-turn pressure. Even xAI’s Grok 4.1 Fast, in its non-reasoning configuration, presented a deeply concerning 88.30% multi-turn ASR, despite a comparatively lower single-turn baseline of 34.15%. These figures highlight a critical gap: a model might appear secure in isolation but unravel when engaged in a more dynamic, human-like interaction.

The Anthropic Claude family of models emerged as a relative strong performer in multi-turn conditions, with ASRs ranging between 11.16% and 16.20% under iterative attack. While these figures are still elevated compared to their single-turn baselines (2.19% to 3.64%), they remain significantly lower than the majority of the cohort tested. This suggests that Anthropic’s approach to safety design may incorporate elements that better withstand prolonged adversarial engagement.

In a particularly counterintuitive finding, Amazon’s Nova variants displayed a different pattern. Instead of increasing vulnerability, these models exhibited higher single-turn failure rates but achieved lower multi-turn ASRs. Nova 2 Lite, for example, recorded a 34% single-turn ASR but managed to achieve the lowest multi-turn ASR in the entire cohort at a mere 7.89%. This suggests a form of "single-turn brittleness" that does not translate into iterative exposure, a characteristic that warrants further investigation into its underlying safety mechanisms.

The Impact of Configuration on AI Safety

Beyond the inherent performance of the base models, the Cisco research also shed light on the significant impact of configuration settings on AI safety. A particularly striking example involved Grok 4.1 Fast. When tested under identical conditions, enabling its reasoning mode resulted in a dramatic reduction in multi-turn ASR, from 88.30% down to 43.47%. This represents a swing of nearly 45 percentage points, directly attributable to a single configuration change.

Cisco emphasizes that this type of configuration-driven safety variation is not typically captured by existing public benchmarks or model cards. The company argues for greater transparency from AI providers, advocating that they should disclose the safety-relevant effects of deployment-time settings alongside their reported capability benchmarks. Such disclosure would provide enterprise buyers with a more comprehensive understanding of a model’s actual security posture in real-world deployment scenarios.

Identifying the Concentration of Failures

The research also delved into the specific types of attack strategies that proved most effective and how different models failed. Cisco decomposed the multi-turn outcomes across five distinct attack strategy families. Within each of these families, the disparity between the most and least vulnerable models was substantial, ranging from 79 to 89 percentage points. This granularity is crucial, as aggregate scores can mask specific vulnerabilities within certain attack vectors.

On the single-turn front, failures tended to concentrate in a smaller subset of procedures. "Imposter AI" attacks were the most prevalent, accounting for a weighted ASR of 37.50%, a figure more than 14 percentage points higher than the tenth-ranked procedure. "Soft Paraphrase" and "System Prompts" were also identified as significant failure points. In terms of content, attacks targeting "Hate Speech," "Profanity," and requests for "Specialized Advice" were the most common categories where single-turn defenses faltered.

Recommendations for Enterprises Navigating AI Safety

Based on these findings, Cisco has formulated three practical recommendations for enterprises evaluating and deploying AI models:

Prioritize Multi-Turn Evaluation: Enterprises should shift their focus from single-turn benchmarks to rigorous multi-turn testing. This involves simulating conversational attacks that mimic real-world adversarial tactics, such as iterative prompting, persona adoption, and gradual escalation of harmful requests. This will provide a more accurate assessment of a model’s true resilience.
Demand Configuration Transparency: Buyers should press AI providers for detailed information regarding the impact of various configuration settings on safety. Understanding how features like reasoning modes, content filters, and system prompts influence a model’s vulnerability is essential for informed decision-making.
Conduct Domain-Specific Testing: Given the varied nature of failures across different attack strategies and content categories, enterprises should perform tailored testing relevant to their specific use cases and risk profiles. This includes evaluating models against potential threats unique to their industry or application.

It is important to note a significant caveat from the Cisco report: the testing was conducted on base models without system prompts, content filters, or custom orchestration. In typical enterprise deployments, these additional controls are implemented and can potentially alter the outcomes, either improving or degrading the model’s safety. However, the fundamental finding about the inadequacy of single-turn evaluations remains.

The overarching message from Cisco’s research is clear and demands attention from the AI industry and its enterprise users. “Safety remains a continuous, regime-dependent property rather than a binary certification,” the report concludes. This statement underscores that AI safety is not a static achievement but an ongoing challenge that is intricately linked to the context and manner of interaction. Even for the most advanced frontier models developed by leading providers, a dynamic and persistent approach to safety evaluation is indispensable. The findings serve as a crucial call to action for a more robust and realistic approach to AI safety assurance in an increasingly AI-driven world.

Single-Turn Scores Fall Short in Predicting Multi-Turn Resilience

The Impact of Configuration on AI Safety

Identifying the Concentration of Failures

Recommendations for Enterprises Navigating AI Safety

Leave a Reply Cancel reply