Skip to content
MagnaNet Network MagnaNet Network

  • Home
  • About Us
    • About Us
    • Advertising Policy
    • Cookie Policy
    • Affiliate Disclosure
    • Disclaimer
    • DMCA
    • Terms of Service
    • Privacy Policy
  • Contact Us
  • FAQ
  • Sitemap
MagnaNet Network
MagnaNet Network

Jaeger v2.18.0 Introduces ClickHouse Support, Revolutionizing Distributed Tracing at Scale

Edi Susilo Dewantoro, May 25, 2026

The highly anticipated Jaeger v2.18.0 release marks a significant milestone for the Cloud Native Computing Foundation (CNCF) graduated distributed tracing platform, with the official introduction of ClickHouse as a supported storage backend. This development, long requested by the Jaeger community, promises to address the escalating demands of modern microservice architectures by leveraging ClickHouse’s specialized architecture for handling massive telemetry data streams and performing complex analytical queries with unprecedented efficiency.

For developers and operations teams grappling with the complexities of distributed systems, Jaeger has been an indispensable tool for monitoring, troubleshooting, and understanding request flows across myriad services. Its ability to pinpoint latency bottlenecks and identify root causes has been critical in reducing Mean Time To Repair (MTTR). The integration of ClickHouse, a column-oriented OLAP database renowned for its high-throughput ingest capabilities, aggressive data compression, and lightning-fast analytical query performance, is poised to elevate Jaeger’s effectiveness even further. This advancement directly tackles the core challenges of storing and querying vast quantities of semi-structured trace data, a growing concern as microservice adoption continues to accelerate.

The decision to integrate ClickHouse did not emerge overnight. Years of user feedback and a growing recognition of the limitations of traditional row-oriented databases for telemetry data spurred this integration. The original article delves into the technical intricacies of this transition, exploring why ClickHouse’s columnar storage is an ideal fit for trace data, detailing the rationale behind the new schema design, and providing practical guidance on how users can begin leveraging this powerful combination.

The Power of Columnar Storage for Telemetry

At its heart, the challenge of distributed tracing boils down to two fundamental problems: the efficient storage of immense volumes of semi-structured event data, and the rapid querying of this data across multiple dimensions such as service, operation, tags, duration, and time range. While established solutions like Cassandra and Elasticsearch have historically served the Jaeger community, they often introduce significant operational overhead. The indexing required by these systems can lead to increased latency and costs, and scaling them effectively for massive datasets can become a complex undertaking. Furthermore, decisions regarding data retention often force users into painful trade-offs between storage costs and the ability to retain valuable historical insights.

ClickHouse, designed from the ground up as a column-oriented OLAP database, directly addresses these pain points. Its architecture is exceptionally well-suited for scenarios characterized by high-throughput data ingestion, aggressive compression, and the need for rapid analytical queries. Trace data, by its very nature, exhibits a high degree of repetition. Identical service names, operation names, status codes, and tags frequently appear across numerous spans. A columnar storage layout excels in such environments, as it groups identical values together, making them highly amenable to efficient compression.

The implications of this architectural alignment are profound. For instance, the repetitive nature of service names like "auth-service" or "payment-gateway," which can appear hundreds of thousands of times, allows ClickHouse to achieve remarkable compression ratios. Benchmarks conducted by the Jaeger team demonstrated an impressive 8.6x compression ratio on the spans table when utilizing ClickHouse. This dramatic reduction in data footprint translates directly into lower storage costs and improved query performance, as less data needs to be read from disk.

Beyond storage efficiency, ClickHouse unlocks new analytical capabilities for trace data. Its inherent efficiency in performing aggregations on columnar data allows Jaeger v2.18.0 to incorporate native ClickHouse Service Performance Monitoring (SPM) methods. These methods enable the direct computation of crucial metrics such as service-level latency, call rates, and error rates directly from the stored spans. This capability empowers teams to generate essential health and performance indicators for their microservices without the need for an additional, separate metrics pipeline, thereby streamlining observability stacks.

Navigating the Schema Design for Optimal Performance

The integration of ClickHouse into Jaeger necessitated a meticulous approach to schema design. The primary objective was to optimize for Jaeger’s core query patterns, which include trace lookup by trace ID, service, and operation; filtering by attributes; time-range queries; and the aggregations that power the Service Performance Monitoring (SPM) feature. These diverse requirements do not always align perfectly, presenting a complex optimization challenge.

Building upon foundational work from a previous benchmark of ClickHouse schemas for Jaeger v1 by Ha Anh Vu, the Jaeger v2 development team revisited and refined several decisions. This evolution was partly driven by Jaeger v2’s adoption of the OpenTelemetry data model, which introduced new data structures and attribute types that required careful consideration within the ClickHouse schema.

A detailed Architectural Decision Record (ADR) documents the intricate design process. Key to understanding ClickHouse’s performance characteristics is the concept of the primary key. In ClickHouse, the primary key does not enforce uniqueness but rather dictates the on-disk sort order and enables a sparse index, which indexes data in 8,192-row granules. The choice of the primary key is therefore a high-leverage decision that significantly impacts query performance.

Two primary candidates were considered for the primary key: trace_id and a combination of (service_name, name, start_time). Sorting by trace_id would theoretically simplify trace retrieval but would severely degrade the performance of search queries that involve filtering by service, operation, or time range. Conversely, sorting by (service_name, name, start_time) would result in slightly slower trace retrieval but could be compensated for by efficient mechanisms like secondary indexes and data skipping.

Empirical data from earlier benchmark runs clearly illustrated this asymmetry. A schema sorted by trace_id yielded trace retrieval times of approximately 27 milliseconds, but multi-filter search queries could take as long as 880 milliseconds. By re-sorting the schema to prioritize (service_name, name, start_time), trace retrieval performance increased to around 100 milliseconds—still well within interactive thresholds—while the latency for multi-filter search queries dropped dramatically to approximately 140 milliseconds. This trade-off demonstrates a clear advantage for typical operational query patterns.

Another critical aspect of the schema design involved the storage of typed attributes. In Jaeger v1, all tags were treated as strings. However, the v2 reader API supports a typed map, allowing attributes to be of types such as Boolean, Int64, Float64, String, or more complex types like Bytes, Slice, and Map. To facilitate querying across these diverse types, the storage layer cannot simply collapse everything into strings. The ClickHouse schema employs ClickHouse’s Nested column type for each primitive attribute type. This structure is applied at the resource, scope, span, event, and link levels, effectively creating mini-tables within each row. This design allows attribute filters to utilize the same query semantics as regular table queries.

However, it is crucial to note that attribute-only searches can be inherently more expensive, as they cannot fully leverage the primary index. The table’s index is optimized for top-level structural fields like service, operation, and time. For optimal query performance and to avoid extensive column scans, users are advised to always combine attribute filters with these indexed fields to narrow down the data ClickHouse needs to process.

To further enhance query efficiency for specific use cases that do not align with the main spans table’s sort order, materialized views are employed. For example, the Jaeger UI requires rapid access to the complete list of known service names and operations, and trace searches frequently need efficient retrieval of trace time ranges. Instead of relying on costly table scans for these queries, materialized views automatically transform inserts into a source table and populate optimized target tables with precomputed data. This approach is instrumental in accelerating queries for service names, operations, and trace ID timestamp ranges.

A subtle but significant technical challenge addressed in the schema design relates to the interpretation of attribute lookups. When a query like http.status_code=200 is executed, the system must discern whether "200" is a string, an integer, and whether it is associated with a span-level attribute or a resource-level attribute. The same logical key might be categorized under different attribute types (str_attributes, int_attributes) and exist at any of the five data levels. To resolve this ambiguity, a dedicated attribute_metadata table is maintained. This table is populated by materialized views that analyze the spans table, allowing the reader to look up the filter key at query time and intelligently query only the columns corresponding to the observed types and levels, thereby optimizing query execution.

Benchmarking at Scale: Real-World Performance

To validate the capabilities of the new ClickHouse backend, the Jaeger team conducted rigorous benchmarks. These tests involved ingesting 10 million spans across 1 million traces on a single-node ClickHouse deployment. The benchmark meticulously measured ingestion throughput, compression ratios, trace retrieval latency, and filtered search latency.

The results were highly encouraging. The backend demonstrated the capacity to sustain an ingestion throughput exceeding 50,000 spans per second. As previously noted, an impressive 8.6x compression ratio was achieved on the spans table, reducing the data footprint from nearly 6 GiB to approximately 722 MiB on disk. Trace retrieval performance averaged around 100 milliseconds, while the majority of search queries completed in under 50 milliseconds. More complex filtered queries demonstrated an average latency of about 140 milliseconds.

These figures, while specific to the benchmark environment and dataset, provide a strong indication of ClickHouse’s potential to handle the demanding requirements of large-scale telemetry data processing within Jaeger. The full methodology, configuration details, and query specifics are made available in the comprehensive benchmarking report, fostering transparency and enabling users to contextualize these results within their own operational environments.

Getting Started with ClickHouse in Jaeger

The ClickHouse support is currently available in alpha as a storage backend, beginning with Jaeger v2.18.0. To leverage this new capability, users will require a running ClickHouse instance and the appropriate Jaeger v2 configuration for the ClickHouse backend. Comprehensive instructions for setup and configuration are detailed in the official Jaeger documentation’s setup guide.

The introduction of ClickHouse support represents a significant leap forward for Jaeger, addressing long-standing community requests and enhancing the platform’s ability to manage and analyze the ever-growing volume of telemetry data generated by modern distributed systems. As a maintainer of Jaeger, the development team expresses profound satisfaction with this achievement and encourages the community to engage, contribute, and report any issues through GitHub or the CNCF #jaeger Slack channel. This collaborative approach ensures that Jaeger continues to evolve and meet the dynamic needs of the cloud-native ecosystem.

Enterprise Software & DevOps clickhousedevelopmentDevOpsdistributedenterpriseintroducesjaegerrevolutionizingscalesoftwaresupporttracing

Post navigation

Previous post
Next post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

⚡ Weekly Recap: Fast16 Malware, XChat Launch, Federal Backdoor, AI Employee Tracking & MoreThe Evolving Landscape of Telecommunications in Laos: A Comprehensive Analysis of Market Dynamics, Infrastructure Growth, and Future ProspectsTelesat Delays Lightspeed LEO Service Entry to 2028 While Expanding Military Spectrum Capabilities and Reporting 2025 Fiscal PerformanceThe Internet of Things Podcast Concludes After Eight Years, Charting a Course for the Future of Smart Homes
Z-Wave Gets a Boost with New Chip ProviderSpaceX Launches Transporter-16 Smallsat Rideshare Mission Deploying 119 Payloads into Low-Earth Orbit from VandenbergAWS Weekly Roundup: Strategic Enhancements Across Compute, Security, and AI Alongside Key Community Engagement InitiativesAmazon’s Smart Home Ambitions Face a Crossroads as a Key Executive Departs and a Decade of Alexa Nears
IoT News of the Week for August 11, 2023The Automation Mirage: How DIY Platforms Create More Complexity Than They SolveRedefining Cybersecurity: How Modern SOCs Are Shifting from Reactive Fortresses to Proactive Risk ReductionThe Ultimate Guide to Top Virtual Machine Software for Windows

Categories

  • AI & Machine Learning
  • Blockchain & Web3
  • Cloud Computing & Edge Tech
  • Cybersecurity & Digital Privacy
  • Data Center & Server Infrastructure
  • Digital Transformation & Strategy
  • Enterprise Software & DevOps
  • Global Telecom News
  • Internet of Things & Automation
  • Network Infrastructure & 5G
  • Semiconductors & Hardware
  • Space & Satellite Tech
©2026 MagnaNet Network | WordPress Theme by SuperbThemes