The laptop return that broke a RAG pipeline

The problem manifested when a user inquired about returning a laptop purchased three weeks prior. The RAG-powered agent, confidently retrieving a 30-day return policy document, advised the customer to proceed with the return. However, this advice was fundamentally incorrect. The retrieved document, while semantically similar due to matching keywords, was from 2023 and did not reflect the current 14-day return policy for electronics. This scenario underscores a crucial limitation of purely vector-based similarity searches: they lack an inherent understanding of temporal relevance or contextual scope.

This incident, initially perceived as a bug, was reclassified as an architectural problem. It has prompted a re-evaluation of what databases need to offer in the era of artificial intelligence, particularly concerning the retrieval of accurate and contextually relevant information.

The Unaddressed Gap in AI Retrieval

For the past two years, the AI development community has been largely preoccupied with mitigating AI hallucinations – instances where models generate fabricated information. RAG emerged as a primary solution, aiming to ground AI models in factual documents. While RAG has proven effective in reducing hallucinations, the underlying retrieval mechanisms have often been treated as a solved problem, a perception now being challenged.

The core issue lies in the distinction between semantic similarity and factual correctness. Vector searches excel at identifying documents that share similar meanings based on their embeddings. However, "close in meaning" does not equate to "correct for this specific context." A deprecated policy document, for example, can be semantically similar to a current one, leading to an accurate but ultimately wrong answer. Similarly, a document tailored for enterprise clients might be semantically relevant to a query from a free-tier user, or a confidential document belonging to one tenant could appear in response to a query from another.

This discrepancy is what is being termed the "retrieval accuracy gap." It represents the chasm between what vector similarity algorithms deem relevant and what an application actually requires for factual accuracy. The author of the original analysis posits that this gap cannot be bridged by simply improving embedding models. The missing elements – such as timestamps, scope definitions, and access permissions – are structured data, residing in database columns rather than in the abstract vector space. Consequently, this problem is fundamentally a database challenge.

Defining Hybrid Search in Practice

Hybrid search, as envisioned by experts in the field, refers to a single database query that seamlessly integrates vector similarity with structured data filtering using SQL predicates. This is distinct from a multi-stage pipeline where an initial vector search yields a large set of potential candidates, which are then filtered by application code. A true hybrid search allows the database engine to holistically optimize the entire query.

The distinction is critical. When filtering occurs within application code, the system first performs a computationally intensive scan of the entire vector index before applying less demanding constraints. This is an inefficient process. A database capable of understanding both vector and relational operations can leverage selectivity estimates to determine the optimal order of operations, deciding whether to filter data before or after the vector scan. This is a principle that has been fundamental to relational database query planning for decades, and its extension to vector indexes is now deemed essential.

To illustrate this concept, consider a typical database schema designed to store documents, including their textual content, vector embeddings, and associated metadata:

CREATE TABLE documents (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  content    TEXT,
  embedding  VECTOR(1536),
  team_id    BIGINT NOT NULL,
  doc_type   VARCHAR(50),
  updated_at DATETIME NOT NULL,
  status     ENUM('active','deprecated','draft'),
  INDEX idx_embedding USING HNSW (embedding),
  INDEX idx_team_status (team_id, status)
);

This schema includes standard relational fields alongside a vector column, facilitating advanced search capabilities. The following query patterns demonstrate how hybrid search addresses the previously identified failure modes:

Pattern 1: Ensuring Recency Through Filtering

The problem of retrieving outdated documents can be effectively mitigated by incorporating a time-based constraint into the query:

SELECT id, content,
       VEC_COSINE_DISTANCE(embedding, @query_vec) AS distance
FROM documents
WHERE status = 'active'
  AND updated_at >= NOW() - INTERVAL 90 DAY
ORDER BY distance
LIMIT 5;

In this query, the WHERE clause prunes the candidate set of documents before the vector scan commences. In a large corpus, such as one with 10 million rows, this filtering can eliminate a substantial percentage of irrelevant data, typically between 60% and 80%. This dual benefit of increased speed and accuracy is a hallmark of effective hybrid search implementations.

Pattern 2: Enforcing Tenant Isolation with Joins

A more critical concern arises when retrieval failures could lead to security breaches, such as cross-tenant data leaks. Hybrid search addresses this by integrating relational joins with access control mechanisms:

SELECT d.id, d.content,
       VEC_COSINE_DISTANCE(d.embedding, @query_vec) AS distance
FROM documents d
JOIN user_permissions p
  ON p.team_id = d.team_id
WHERE p.user_id = @current_user
  AND d.status = 'active'
ORDER BY distance
LIMIT 5;

This query uses a relational join against a user_permissions table. Consequently, regardless of how semantically similar a document might be to a user’s query, they will only see content within their authorized scope. This enforcement by the database engine provides a robust security guarantee, unlike application-level filtering which can be prone to human error or oversight. Attempting to achieve similar security with standalone vector databases often involves complex metadata tagging, re-indexing upon permission changes, and intricate tag-based filtering that can become unmanageable.

Pattern 3: Category Ranking and Aggregation for Deeper Insights

In some cases, the most accurate answer may not be a single document but rather an aggregate pattern across multiple documents. Hybrid search enables this by grouping and ranking results based on document types:

SELECT d.doc_type,
       COUNT(*)                                  AS match_count,
       MIN(VEC_COSINE_DISTANCE(d.embedding, @query_vec)) AS best_dist,
       GROUP_CONCAT(d.id ORDER BY
         VEC_COSINE_DISTANCE(d.embedding, @query_vec)) AS doc_ids
FROM documents d
WHERE d.status = 'active'
  AND VEC_COSINE_DISTANCE(d.embedding, @query_vec) < 0.3
GROUP BY d.doc_type
ORDER BY match_count DESC, best_dist ASC
LIMIT 3;

This query provides an LLM with nuanced guidance, such as indicating that the answer is more likely to be found in FAQ documents (7 matches) than in blog posts (2 matches). The LLM can then prioritize retrieving documents from the highest-ranked category. This capability, rooted in relational algebra and the use of GROUP BY, is beyond the scope of traditional vector databases and significantly enhances retrieval quality in corpora with overlapping document categories.

Performance Implications of Hybrid Search

Empirical analysis of hybrid search approaches against production workloads, involving a 10-million-row enterprise knowledge base spanning 18 months of content and diverse document types, has yielded promising results. Queries, when executed using hybrid search, exhibited latencies in the range of 15-30 milliseconds, a negligible increase that is practically invisible to end-users. Crucially, the cross-tenant leak rate was found to be zero, a result of the inherent security guarantees provided by relational joins.

Interestingly, in many real-world scenarios, hybrid search can actually outperform pure vector search. This is because structured filters can dramatically reduce the search space for vector operations. When a significant portion of the corpus (e.g., 70%) is pruned before the vector scan even begins, the overall wall-clock time for query execution is reduced. These performance gains are contingent on the specific distribution of data within the corpus and the selectivity of the filters applied.

The "Vector Sidecar" Anti-Pattern

A prevalent architectural pattern that contributes to RAG quality issues in production environments is the "vector sidecar" approach. This pattern involves maintaining a primary operational database (e.g., MySQL, PostgreSQL) for application data and a separate, dedicated vector database for storing embeddings. This separation necessitates a synchronization pipeline to ensure data consistency across both systems. Every insert, update, or deletion operation must be propagated to both databases, leading to the management of two distinct schemas, connection pools, monitoring dashboards, and a fragile ETL job.

This "vector sidecar" anti-pattern introduces several compounding problems:

Data Consistency Issues: Maintaining synchronicity between two independent databases is inherently challenging. Discrepancies can arise due to network failures, transaction inconsistencies, or delays in the synchronization process, leading to retrieval errors.
Operational Complexity: Managing two distinct database systems doubles the operational overhead in terms of deployment, monitoring, scaling, and maintenance. This complexity can strain engineering resources and increase the potential for system failures.
Performance Bottlenecks: The need for data synchronization and the potential for distributed transactions can introduce latency and performance bottlenecks, negating some of the perceived benefits of specialized vector databases.

The alternative to this complex architecture is to consolidate vector data and structured data within the same database. This approach offers a single point of access, a unified transaction boundary, and a consistent data model. The database’s query planner then handles the optimization of queries, dynamically deciding whether to prioritize filtering or vector scanning based on selectivity estimates. This integration is a key driver behind the development of native vector support in modern database systems.

The Importance of SQL Compatibility

A significant, often underappreciated, advantage of integrating vector capabilities into relational databases with SQL compatibility lies in reducing adoption friction. SQL is a universally understood language in application development, supported by virtually all Object-Relational Mappers (ORMs) and connection pooling libraries. When a database natively supports vector operations within the SQL framework, hybrid search queries are not perceived as exotic additions but as standard SQL statements. This allows development teams to leverage their existing SQL expertise without requiring them to learn new query languages, client libraries, or operational paradigms. The adoption barrier is significantly lowered, enabling teams to implement AI features more efficiently and effectively.

When Hybrid Search Isn’t Necessary

While hybrid search offers substantial benefits for many AI applications, it’s important to acknowledge scenarios where pure vector search might suffice. If the application’s primary requirement is finding documents that are semantically similar, and strict factual accuracy or contextual relevance is not paramount – for instance, in recommendation engines where a "good enough" suggestion is acceptable, or in creative content generation where serendipity is valued – then a dedicated vector database may be adequate.

However, the moment that correctness becomes a critical requirement, particularly in applications involving multiple tenants, time-sensitive data, or scenarios where incorrect retrieval could lead to erroneous actions without immediate human review, hybrid search becomes indispensable. For the vast majority of production RAG systems, these conditions are present from day one.

The Pivotal Role of the Middle Layer

The AI stack can be broadly categorized into three layers: the embedding model (responsible for encoding meaning), the generation model (responsible for synthesizing answers), and the database query layer (responsible for retrieving context). While significant investment has been directed towards optimizing the embedding and generation models, the retrieval query layer has often been treated as a commodity.

This middle layer, however, is where factual correctness is fundamentally determined. The embedding model translates a question into a vector, and the generation model transforms retrieved documents into an answer. But it is the retrieval query that dictates which documents are presented to the model. An inaccurate retrieval at this stage will inevitably lead to an incorrect answer, regardless of the sophistication of the embedding or generation models.

Hybrid search, by combining vector similarity with relational filters within a single database query, effectively bridges this retrieval accuracy gap. This approach is not overly complex; it is essentially standard SQL enhanced with a distance function. The primary prerequisite is a database system that natively supports both vector and relational operations without forcing a compromise.

The foundational principle behind the development of systems like TiDB has always been to adapt the database to the application, rather than the reverse. This has historically meant ensuring MySQL compatibility for ease of adoption, providing horizontal scalability to accommodate growth, and now, incorporating native vector support to facilitate the development of AI-powered features. The underlying philosophy remains consistent: the database should evolve to meet the changing needs of modern applications.