Arize AI and Google Cloud lay down standardized telemetry mandate to keep enterprise agents in check

The modern enterprise software landscape is characterized by its composable nature, affording developers unprecedented architectural freedom. This flexibility allows for the creation of optimized code deployments by leveraging componentized and containerized logic. These deployments can then be seamlessly shifted across diverse workloads and multi-cloud environments. However, this agility, while a boon for adaptability, presents a significant challenge for the telemetry of AI agents, often described as the "Wild West" due to a lack of standardization.

Agentic functions, mirroring the freedom of movement seen in traditional software components, are now empowered with capabilities that extend far beyond simple task execution. Developers are equipping these agents with the ability to interact with multiple system tools, invoke connections to various AI models—encompassing large language models, visual recognition systems, and more—and even refine user requests before delegating them to other domain-specific agents. This level of sophistication promises remarkable advancements in system adaptability but simultaneously creates a complex web of interactions that necessitates robust and standardized telemetry for effective management and oversight.

The Critical Importance of Agent Telemetry

Within the broader domain of observability, detailed telemetry is indispensable for software engineers. It provides crucial insights into the location of agents, their granted permissions and connections, and the specific actions they have undertaken. Richard Young, Technical Director of Partner Solutions Architecture at Arize, an AI agent engineering company, emphasizes that the challenge with agent telemetry is not merely about identifying integration points. Instead, he highlights the paramount importance of "portability," not just for the agents themselves, but critically, for the telemetry standards used to measure them.

Young explains on Arize’s blog that adopting standards like OpenTelemetry and OpenInference ensures that organizations retain flexibility without sacrificing visibility. "When you use standards like OpenTelemetry and OpenInference, you keep optionality without losing visibility," he states. "Standardized agent telemetry lets you change frameworks, models, tools, or observability backends without rebuilding your instrumentation every time. The trace format stays consistent even as the stack changes." For Young, the core issue transcends point-to-point integrations; it is about fostering a unified telemetry model for agents.

Synergistic Efforts: Google Cloud and Arize AI

In a significant development, Arize is collaborating with Google Cloud, a move that follows the hyperscaler’s recent launch of the Gemini Enterprise Agent Platform. Arize’s AX enterprise agent development platform is designed to ingest traces—chronological records of software execution event history—from the Gemini Agent service. Crucially, it aligns agent telemetry around OpenTelemetry and OpenInference. This standardization empowers software engineering teams to instrument agents once, analyze their behavior consistently, and crucially, avoid vendor lock-in for their vital observability data.

Ryan Mangan, CEO of EfficientEther, a cloud resource optimization company, underscores the fundamental principle that in any live production software deployment, visibility is non-negotiable. "You can’t operate what you can’t see, and that goes double for agents," Mangan tells The New Stack. He elaborates on the intricate nature of a single agent’s operation: "A single agent run can include request rewriting, retrieval, multiple tool and model calls, retries, and handoffs before producing a final answer. Without structured telemetry covering each of those steps, debugging becomes painstaking guesswork and evaluation becomes extremely difficult." Mangan echoes the sentiment that standards like OpenTelemetry and OpenInference are vital, providing developers with a consistent framework for understanding agent actions, irrespective of the underlying framework, model, or platform.

The Genesis and Evolution of OpenTelemetry

The origins of OpenTelemetry trace back to 2019, emerging from the merger of two distinct initiatives: Google’s OpenCensus and the Cloud Native Computing Foundation’s (CNCF) OpenTracing project. This consolidation aimed to create a unified, vendor-neutral standard for collecting telemetry data.

During a recent Google Cloud NEXT session, Jason Lopatecki, founder and CEO of Arize AI, and Rami Shalom, a Google Cloud product leader, delved into the critical need for monitoring and improving enterprise AI agents. Richard Young of Arize referenced this discussion, drawing a parallel to the past challenges in observability. He explained that the industry has "gone through this transition once" when it grappled with competing tracing standards, proprietary SDKs, and fragmented instrumentation. This historical context highlights the proactive efforts being made to avoid repeating similar pitfalls with the burgeoning field of AI agents.

The Principle of "Instrument Once, Route Anywhere"

The emerging consensus in the industry points towards the critical need for teams to "instrument once, but route anywhere." Noam Levy, Founding Engineer and Field CTO at groundcover, a cloud-native observability platform company, acknowledges OpenTelemetry’s rapid ascent to an essential standard. However, he cautions that adopting the standard alone does not resolve the more complex challenges of how telemetry is actually collected, normalized, and trusted at scale.

"The next question isn’t just whether teams can afford to pay SaaS vendors to store and interpret that data—it’s whether that model holds up given the volume and privacy demands of agent-driven systems," Levy explains. He further elaborates that OpenTelemetry, by itself, does not unify agent observability. "Teams still have to reconcile fragmented telemetry across providers, i.e., OpenAI looks different from Anthropic, which forces them to build systems that constantly adapt to upstream changes."

Levy suggests that emerging technologies like eBPF (extended Berkeley Packet Filter) offer a foundational shift. Operating at the operating system level, eBPF enables teams to observe system behavior without relying on application-level instrumentation. This allows for the capture of signals directly from how software actually runs, rather than from how it is instrumented, potentially offering a more comprehensive and less intrusive approach to telemetry collection.

The Security Dimension: A CISO’s Perspective

David Girvin, an AI Security Researcher at Sumo Logic, a company specializing in log analytics and cloud security information and event management (SIEM), points out that while converging on OpenTelemetry is a significant step, the more formidable challenge lies in managing the telemetry data at scale. "A single agent run is a manageable transcript," Girvin states. "A thousand agents all running across production, handing off between each other, calling external tools, hitting retrieval systems and spawning sub-agents simultaneously? That becomes a data problem."

Girvin advocates for a broader democratization of tools and perspectives, emphasizing that OpenTelemetry agent conventions are currently being developed primarily by ML engineers for ML engineers. He raises a crucial point: "The CISO hasn’t shown up to that conversation yet." When Chief Information Security Officers (CISOs) become more involved, Girvin predicts that teams whose instrumentation is solely focused on observability may find their telemetry insufficient for board-level investigations and security audits. This highlights the need to incorporate security and compliance requirements into the design and implementation of agent telemetry from the outset.

Decoding Agent Behavior: The "How to Question an Agent" Imperative

The increasing autonomy granted to AI agents necessitates a clear understanding of their decision-making processes. Traces are now essential to illustrate the path an agent followed to reach its final output. Without this detailed record, software engineers will face significant hurdles in evaluating, debugging, and ultimately improving agent actions.

Effective agent traces must provide answers to a series of critical questions:

Request Transformation: How was the original user request transformed or reinterpreted by the agent?
Resource Utilization: Which specific models, tools, and data sources were employed in the agent’s execution?
Performance and Reliability Metrics: To what degree did latency, hallucination (generating inaccurate or fabricated information), policy failures, or suboptimal retrieval impact the outcome?
Areas for Improvement: Which steps in the agentic code execution are candidates for further analysis and optimization?

Standardization: The Prerequisite for Advanced Agentification

In the pursuit of robust and reliable AI systems, the industry is coalescing around the necessity of standardizing the measures of agent behavior. This standardization is not merely about collecting data but also about establishing standardized methods of measurement. When these two elements—standardized behavior metrics and standardized measurement methodologies—are in place, we can achieve structured agent telemetry. Such telemetry will possess sufficient semantic detail to support comprehensive evaluation and drive meaningful improvements in agentic capabilities. This proactive approach to standardization is crucial for ensuring that as AI agents become more sophisticated and integrated, their behavior remains transparent, auditable, and controllable. The journey towards truly agentified systems must be paved with a foundation of reliable and universally understood telemetry.