PostgreSQL's Storage Conundrum: Why Object Storage Isn't a Universal Fit for Database Operations

The allure of cost-effective and virtually limitless storage solutions like Amazon S3 has led many to explore its use for a wide array of data management tasks. However, the fundamental differences between object storage and the high-performance storage required by transactional databases like PostgreSQL present a critical challenge. The temptation to consolidate all storage needs into a single, seemingly simple solution overlooks the distinct operational demands of a robust database system, particularly concerning latency-sensitive operations. While S3 excels at archiving and backup, attempting to shoehorn it into the real-time transaction processing path of PostgreSQL can lead to significant performance degradation and operational complexities.

The Core Challenge: Latency in Transactional Workloads

The primary difficulty in using PostgreSQL with object storage does not stem from the sheer volume of data, but rather from the critical moments when the database must pause and wait for I/O operations to complete. PostgreSQL’s commitment mechanism, crucial for ensuring data durability, requires a synchronous write of transaction logs (WAL) to persistent storage. This operation, handled by the XLogFlush() function, blocks the database backend until the kernel confirms the write is durable.

On modern, high-performance enterprise NVMe drives equipped with power-loss protection, this durability confirmation can take tens of microseconds. This is a remarkably short duration, allowing for rapid transaction commits. However, when this process is attempted over networked storage, or more critically, over object storage like S3, the latency can extend into milliseconds or even longer. This substantial increase in latency directly impacts the performance of transactional workloads.

Commit Latency: A Direct Hit on User Experience

Commit latency is not merely an abstract benchmark; it has tangible consequences for application performance. For Online Transaction Processing (OLTP) systems, which are characterized by a high volume of small, frequent transactions, commit latency sets a hard ceiling on how quickly individual user sessions can complete their operations. While techniques like group commit can mitigate this by allowing multiple transactions to share a single WAL flush, many production applications do not operate in a state of consistently high concurrency. In scenarios with lower concurrency, storage latency directly translates into user-visible response times, leading to a degraded user experience.

Recent benchmarking efforts, such as those conducted on managed PostgreSQL services, have illuminated this performance disparity. As workloads exceed available memory, systems employing faster, local storage consistently outperform those relying on slower, remote storage solutions when it comes to database operations. This reinforces the notion that for PostgreSQL, especially when dealing with data that must be persisted reliably, the speed and predictability of the underlying storage are paramount.

The Promise of `fsync`: More Than Just a Write

The integrity of PostgreSQL’s data relies heavily on the fsync operation, which ensures that data is physically written to storage and not merely buffered. For PostgreSQL, fsync is not just a write operation; it is a promise of durability. Enterprise-grade SSDs with built-in power-loss protection are engineered to fulfill this promise more efficiently. They often utilize capacitor-backed caches that can sustain data even during sudden power outages, allowing the fsync operation to be acknowledged much faster. Consumer-grade SSDs, on the other hand, typically have less robust caching mechanisms, making them more susceptible to data loss during power failures and consequently exhibiting higher fsync latencies. This explains why two drives that appear equally fast on paper can perform vastly differently under commit-heavy workloads.

Read Performance: The Agility of Small Pages

The performance challenges extend to read operations as well. PostgreSQL organizes its data and indexes into 8 KB pages. When a requested page is not found in the database’s buffer cache or the operating system’s page cache, the database backend must fetch it from disk. OLTP workloads are characterized by a constant stream of such reads: navigating indexes, retrieving row data, verifying visibility, and repeating the process.

High-performance storage solutions like NVMe are exceptionally adept at handling these numerous small, random reads with low latency. Object storage, by contrast, is designed for larger, higher-latency object requests. The fundamental mismatch lies not in raw bandwidth, but in the pattern of access. PostgreSQL’s need for many tiny, latency-sensitive reads clashes with the architecture of object storage, which is optimized for fewer, larger data transfers.

The Growing Pains of Working Set Exceeding Memory

As a database’s "working set" – the data actively being accessed – grows beyond the capacity of available memory (including shared_buffers and the OS page cache), the latency of the underlying storage becomes a dominant factor in performance. When hot queries begin to experience cache misses, the speed at which data can be retrieved from disk directly dictates the overall application responsiveness. This is where the limitations of object storage become acutely apparent for the active transactional layer of PostgreSQL.

MVCC and I/O Amplification: PostgreSQL’s Design Choices

PostgreSQL’s Multi-Version Concurrency Control (MVCC) mechanism, while essential for providing high concurrency and crash safety, inherently leads to I/O amplification. Updates do not modify rows in place; instead, they create new tuple versions, necessitating updates to associated indexes. Checkpoints involve writing full pages to the WAL stream, and hint bits can convert read operations into writes. Furthermore, the vacuum process is crucial for reclaiming space occupied by dead tuples and managing transaction ID age. These processes, fundamental to PostgreSQL’s operation, place a significant demand on storage that can efficiently handle a large volume of small, scattered I/O operations without compromising performance.

Modern Architectures: Isolating Object Storage

Recognizing these performance bottlenecks, contemporary managed PostgreSQL services that leverage object storage do not place it directly in the critical transaction path. Instead, they employ a layered architecture. A fast layer – often a local cache, a write-ahead log buffer, or a page-serving layer – is positioned close to the database. Colder, less frequently accessed, or reconstructable data is then pushed to a more durable, remote object storage tier. This architectural pattern, observed across various implementations, is not about branding but about functional convergence: isolating the latency-sensitive commit path from the inherent latency of object storage. The focus remains on protecting the commit process from the delays associated with accessing data from remote object stores.

PostgreSQL 18 and the Drive for Faster Storage

The ongoing development of PostgreSQL further underscores the importance of high-performance storage. For instance, PostgreSQL 18 introduces asynchronous I/O support, signaling an increased expectation for concurrent storage access and the ability to drive faster storage devices more effectively. This development is geared towards enabling PostgreSQL to better leverage the capabilities of high-speed storage like NVMe, rather than attempting to make object storage mimic the performance characteristics of local SSDs. As PostgreSQL continues to evolve in its ability to issue parallel, low-latency I/O requests, the benefits derived from storage solutions that offer quick and predictable responses will become even more pronounced.

S3’s True Strengths: Archiving, Backups, and Analytics

It is crucial to reiterate that this discussion is not a critique of S3’s capabilities. Amazon S3 is an exceptionally capable service, perfectly suited for its intended use cases. These include WAL archiving, storing base backups, managing snapshots, long-term data retention, and serving as a data source for downstream analytical systems. S3 also plays a vital role in replication and migration workflows, particularly for initial data loads, backfills, and large-scale cutovers to new systems. The operational imperative is to maintain a clear separation between these archival and backup functions and the real-time transaction commit path.

Teams implementing Change Data Capture (CDC) pipelines or planning large-scale PostgreSQL migrations are typically addressing a different set of problems than those encountered during transaction commits. These operations often involve batch processing or asynchronous data movement, where the latency characteristics of object storage are far less critical.

Decoupling Analytics for Optimal Performance

Similarly, the separation of analytical workloads from transactional operations is a well-established best practice. While PostgreSQL is a powerhouse for OLTP, using the same instances to execute large scans and aggregations can lead to resource contention, impacting both transactional throughput and analytical query performance. This is precisely why significant engineering effort is dedicated to moving analytical workloads off the primary OLTP path. This is achieved through various means, including replication to separate analytical databases or the implementation of dedicated open-source stacks designed for analytics alongside PostgreSQL. The objective is not to diminish PostgreSQL’s capabilities but to allow it to excel at its core strengths by offloading tasks that do not align with its transactional design.

The Optimal Strategy: A Bifurcated Storage Approach

The optimal approach for managing PostgreSQL storage is not an "either/or" proposition between NVMe and S3. Instead, it lies in a strategic combination of both, with a clear and well-defined boundary between their respective roles. High-performance local or block storage should be allocated to handle the demanding operations of commits, cache misses, checkpoints, and vacuuming – essentially, the "hot path" of the database. Object storage, with its durability and cost-effectiveness, is ideally suited for archiving, backups, and managing historical data – the "cold path."

PostgreSQL achieves peak performance when its critical transactional operations are confined to storage solutions capable of responding within microseconds. The cold path, on the other hand, can leverage the higher latency of object storage for its primary purpose: ensuring long-term data durability and accessibility for archival and backup needs. Forcing these two distinct jobs – low-latency transactional processing and high-durability archiving – into the same storage layer inevitably leads to performance bottlenecks and compromises the integrity of the database’s critical functions. By understanding and respecting these distinct requirements, organizations can architect more robust, performant, and cost-effective PostgreSQL deployments.

PostgreSQL’s Storage Conundrum: Why Object Storage Isn’t a Universal Fit for Database Operations

The Core Challenge: Latency in Transactional Workloads

Commit Latency: A Direct Hit on User Experience

The Promise of `fsync`: More Than Just a Write

Read Performance: The Agility of Small Pages

The Growing Pains of Working Set Exceeding Memory

MVCC and I/O Amplification: PostgreSQL’s Design Choices

Modern Architectures: Isolating Object Storage

PostgreSQL 18 and the Drive for Faster Storage

S3’s True Strengths: Archiving, Backups, and Analytics

Decoupling Analytics for Optimal Performance

The Optimal Strategy: A Bifurcated Storage Approach

Leave a Reply Cancel reply

The Core Challenge: Latency in Transactional Workloads

Commit Latency: A Direct Hit on User Experience

The Promise of fsync: More Than Just a Write

Read Performance: The Agility of Small Pages

The Growing Pains of Working Set Exceeding Memory

MVCC and I/O Amplification: PostgreSQL’s Design Choices

Modern Architectures: Isolating Object Storage

PostgreSQL 18 and the Drive for Faster Storage

S3’s True Strengths: Archiving, Backups, and Analytics

Decoupling Analytics for Optimal Performance

The Optimal Strategy: A Bifurcated Storage Approach

Leave a Reply Cancel reply

The Promise of `fsync`: More Than Just a Write