Open-Source Wikis

/

DuckDB

/

Systems

/

Storage

duckdb/duckdb

Storage

Active contributors: Mytherin, Tishj, Mark

Purpose

src/storage/ owns the on-disk database file, the buffer manager that brings blocks into memory, table data structures (row groups, segments), compression codecs, the write-ahead log, and the checkpoint protocol. A DuckDB database is one file; this directory is what makes that work.

Directory layout

src/storage/
├── storage_manager.cpp           Per-database init, attach/detach, version checks
├── single_file_block_manager.cpp Default block manager: maps logical blocks to file offsets
├── block.cpp                     Block primitive
├── block_allocator.cpp           Allocates fresh blocks
├── buffer_manager.cpp            Buffer manager interface
├── standard_buffer_manager.cpp   Default LRU+pinning buffer manager
├── partial_block_manager.cpp     Sub-block packing for small tables
├── arena_allocator.cpp           Arena allocator for pipelined writes
├── checkpoint_manager.cpp        Checkpoint orchestration
├── write_ahead_log.cpp           WAL writer
├── wal_replay.cpp                WAL replay on startup
├── temporary_file_manager.cpp    Disk spill manager
├── temporary_memory_manager.cpp  In-memory budget for spillable structures
├── data_table.cpp                Per-table append/scan/update/delete
├── local_storage.cpp             Per-transaction uncommitted changes
├── data_pointer.cpp              Persisted block-pointer metadata
├── optimistic_data_writer.cpp    Eager writes for large appends
├── storage_index.cpp             Index entry persistence
├── table_index_list.cpp          Per-table list of indexes
├── magic_bytes.cpp               File-header magic-byte detection
├── storage_lock.cpp              Reader/writer locks for storage state
├── storage_info.cpp              Storage version constants
├── version_map.json              Version compatibility map
├── open_file_storage_extension.cpp  Hook for storage extensions
├── index.cpp                     Index registration
├── buffer/                       Allocator and pinning helpers
├── checkpoint/                   Checkpoint state machines
├── compression/                  Per-codec compress/decompress
├── external_file_cache/          Cache for remote/external files
├── metadata/                     Catalog metadata blocks
├── serialization/                Generated serialize/deserialize for plans + storage
├── statistics/                   Per-segment / per-row-group statistics
└── table/                        Row group, column data, segment trees

Key abstractions

Type File Role
StorageManager src/storage/storage_manager.cpp Per-database top-level: opens the file, runs WAL replay, reads schemas, decides if a checkpoint is needed.
BlockManager src/include/duckdb/storage/block_manager.hpp Abstract: maps logical block IDs to bytes. Default implementation SingleFileBlockManager keeps everything in one file.
BufferManager src/include/duckdb/storage/buffer_manager.hpp Manages a memory budget for blocks. StandardBufferManager is the production implementation.
BlockHandle / BufferHandle src/storage/buffer/ RAII handles for pinning a block in memory; release happens when the handle is destroyed.
DataTable src/storage/data_table.cpp Per-table API used by INSERT/UPDATE/DELETE/SCAN. Holds row groups, indexes, statistics.
RowGroup src/storage/table/row_group.cpp A horizontal slice of a table (default 122,880 rows).
ColumnData src/storage/table/column_data.cpp Per-column storage inside a row group, made of ColumnSegments.
WriteAheadLog src/storage/write_ahead_log.cpp Streaming append-only log written by every committing transaction.
CheckpointManager src/storage/checkpoint_manager.cpp Periodically flushes dirty data into the main file and truncates the WAL.
LocalStorage src/storage/local_storage.cpp Per-transaction view of uncommitted appends/updates/deletes.
TemporaryFileManager src/storage/temporary_file_manager.cpp Spills blocks to disk when the buffer manager is over budget.

How it works

graph TD
    SQL[INSERT/UPDATE/DELETE] -->|via PhysicalOperator| LS[LocalStorage]
    LS -->|on commit| DT[DataTable]
    DT -->|append| RG[RowGroup -> ColumnData -> ColumnSegment]
    RG -->|allocate blocks| BM[BlockManager]
    BM -->|read/write| BUF[BufferManager]
    BUF -->|pin/unpin| FILE[Single database file]
    DT -.->|log entry| WAL[WriteAheadLog]
    WAL -->|periodic| CK[CheckpointManager]
    CK -->|flush dirty + truncate WAL| FILE

Single-file storage

A DuckDB database file is divided into fixed-size blocks. The default block size is 256 KB; it is fixed at database creation time. The first few blocks contain metadata (the database header, schema metadata, free-block lists). Tables and indexes live in the remaining blocks.

SingleFileBlockManager (single_file_block_manager.cpp) tracks:

  • The current header.
  • A free list of unused blocks.
  • A used list of allocated blocks per object.

Two header copies are written alternately so that a crash at any point leaves at least one valid header.

Row groups and segments

Each table is a sequence of RowGroups. A row group:

  • Has a fixed maximum row count (122,880).
  • Stores per-column statistics (min/max/distinct/null counts).
  • Contains one ColumnData per column, made of one or more ColumnSegments.

Segments are compressed using one of the codecs in src/storage/compression/uncompressed, bitpacking, dictionary, chimp, patas, alp, fsst, rle. Compression is chosen per-segment via compression_config.cpp based on a quick analysis pass.

Write-ahead log

Every committed write produces WAL records (write_ahead_log.cpp). On startup, wal_replay.cpp reads the WAL, replays the records into the in-memory state, and triggers a checkpoint if needed. The WAL is a separate file with a .wal suffix next to the database file.

Checkpointing

CheckpointManager (checkpoint_manager.cpp) flushes all dirty in-memory data into the main file and truncates the WAL. It is triggered on database close, on user request (PRAGMA force_checkpoint), or automatically when the WAL grows past a threshold.

Buffer management

StandardBufferManager keeps a budget (default 80% of available memory). When a block is pinned, it is read from disk if not already in memory; if the budget is exceeded, victim blocks are spilled to a temporary file via TemporaryFileManager.

temporary_memory_manager.cpp handles in-memory budgeting for spillable structures (sort buffers, hash tables) so they cooperate with the buffer manager rather than fight it.

External file cache

src/storage/external_file_cache/ caches data read from remote sources (S3, HTTP, local file system) into a configurable disk-backed cache. This is the integration point for httpfs-style extensions.

Integration points

  • Tables/scans in execution call into DataTable for reads (DataTable::Scan) and writes (DataTable::Append, DataTable::Update, DataTable::Delete).
  • Transactions (transaction) coordinate reads of versioned data through DuckTransaction and produce undo records that mirror what is written here.
  • Catalog (catalog) persists CatalogEntrys through this layer in metadata blocks (storage/metadata/).
  • Compression configuration flows from src/function/compression_config.cpp and src/storage/compression/.

Entry points for modification

  • Adding a compression codec: implement CompressionFunction in src/storage/compression/<codec>/, register in compression_config.cpp, add tests in test/sql/compression/.
  • Storage format changes: bump the storage version (storage_info.cpp, version_map.json), add backward-compatibility tests in test/bwc/, run scripts/test_storage_compatibility.py.
  • Adjusting buffer memory: PRAGMA memory_limit = '4GB' and PRAGMA temp_directory = '/tmp/...'. Implementation: standard_buffer_manager.cpp, temporary_file_manager.cpp.
  • Implementing an alternative block manager (e.g., for an embedded environment): subclass BlockManager. The interface lives in src/include/duckdb/storage/block_manager.hpp and is intentionally narrow.
  • Adding row-group-level pruning hints: RowGroupPruner in optimizer consumes statistics produced here.

Key source files

File Purpose
src/storage/storage_manager.cpp Database open/close lifecycle.
src/storage/single_file_block_manager.cpp Single-file block layout.
src/storage/standard_buffer_manager.cpp Block cache + spill coordination.
src/storage/data_table.cpp Per-table read/write API.
src/storage/local_storage.cpp Per-transaction local state.
src/storage/checkpoint_manager.cpp Checkpoint orchestration.
src/storage/write_ahead_log.cpp WAL writer.
src/storage/wal_replay.cpp WAL replay.
src/storage/table/row_group.cpp Row group structure.
src/storage/compression/*/*.cpp Per-codec implementations.

Continue to transaction for MVCC and durability semantics, or catalog for how schemas/tables are stored.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Storage – DuckDB wiki | Factory