Open-Source Wikis

/

DuckDB

/

Features

/

Transactions and storage

duckdb/duckdb

Transactions and storage

What you get when you run BEGIN, COMMIT, ROLLBACK, or ATTACH in DuckDB. This page is the user-facing summary; the implementation lives in systems/transaction, systems/storage, and systems/catalog.

Single-file storage

A DuckDB database is one file. The on-disk layout:

  • A header at the start (with two alternating copies for crash safety).
  • A free-block list.
  • Catalog metadata blocks.
  • Per-table row group blocks.
  • Per-index blocks.

Default block size is 256 KB, set at database creation time. The block manager is src/storage/single_file_block_manager.cpp.

A WAL (write-ahead log) lives next to the database file with a .wal suffix. It is appended on every commit and truncated on checkpoint.

graph LR
    DB["my.db (single file)"] -.->|alternating headers| Header[Header A / Header B]
    DB -->|free list| Free[Free blocks]
    DB -->|catalog metadata| Cat[Catalog blocks]
    DB -->|table data| RG[Row groups -> column segments]
    WAL["my.db.wal"] -->|on commit| DB

ACID

DuckDB transactions are ACID under snapshot isolation. Concretely:

  • Atomic. A transaction's writes either all become visible (commit) or none do (rollback). The undo buffer in src/transaction/undo_buffer.cpp records every change.
  • Consistent. Constraints (NOT NULL, CHECK, FOREIGN KEY, PRIMARY KEY, UNIQUE) are validated on commit.
  • Isolated. Each transaction reads from a stable snapshot; concurrent reads never see uncommitted data; concurrent writes that conflict on the same row abort one of the transactions.
  • Durable. Commits write to the WAL before they are acknowledged. After a crash, WAL replay (src/storage/wal_replay.cpp) reapplies committed transactions.

Isolation level is snapshot isolation — DuckDB does not offer read-uncommitted, repeatable read, or serializable.

Transaction lifecycle

BEGIN;                               -- starts a snapshot
SELECT count(*) FROM big_table;      -- reads against snapshot
INSERT INTO big_table VALUES (...);  -- writes go to LocalStorage
COMMIT;                              -- assigns commit_id, writes WAL, installs versions

Internally:

Step What happens
BEGIN DuckTransactionManager issues a snapshot_id.
Reads Visible versions are those committed <= snapshot_id (or written by this transaction).
Writes Stored in LocalStorage (src/storage/local_storage.cpp) and the undo buffer.
COMMIT Assigns commit_id, validates constraints, writes WAL records (src/transaction/wal_write_state.cpp), installs versions.
ROLLBACK Walks the undo buffer in reverse, restoring previous state.

ABORT is a synonym for ROLLBACK. SAVEPOINT name; ... ROLLBACK TO name; lets you partially undo a transaction.

Auto-commit

Outside an explicit transaction, every statement runs in its own auto-commit transaction. Connection::Query is auto-commit by default; explicit BEGIN/COMMIT opts out.

Concurrency

  • Multiple readers can run in parallel against any number of snapshots. They see consistent views of the data.
  • Multiple writers are allowed; conflicts (two transactions updating the same row) are detected on commit and one aborts with TransactionContextException: Conflict on tuple ....
  • Schema changes participate in the same MVCC: CREATE TABLE in a transaction is invisible to other transactions until commit.

Multi-version tables

Each table is a sequence of row groups. Within a row group, columns store the most recent committed value, and a version chain hangs off rows that were updated or deleted. Cleanup (src/transaction/cleanup_state.cpp) prunes the chain when no live snapshot can see older versions anymore.

Checkpointing

When the WAL grows past a threshold (or on PRAGMA force_checkpoint, or on database close), CheckpointManager (src/storage/checkpoint_manager.cpp) flushes dirty blocks into the main file and truncates the WAL. Checkpointing is concurrent with reads — readers see a consistent view throughout.

Buffer manager

In-memory state is managed by StandardBufferManager (src/storage/standard_buffer_manager.cpp). A configurable budget (PRAGMA memory_limit) caps how many blocks can be pinned at once; victims are spilled to a temporary file via TemporaryFileManager (src/storage/temporary_file_manager.cpp).

Compression

Each ColumnSegment (a contiguous run of values for one column inside a row group) is independently compressed using one of:

  • uncompressed
  • bitpacking
  • dictionary
  • chimp / patas / alp (floating point)
  • fsst (string)
  • rle

Compression choice is per-segment, picked by analyses in src/function/compression_config.cpp. See systems/storage.

Multi-database (ATTACH)

ATTACH 'a.db' AS a;
ATTACH 'b.db' AS b;
SELECT * FROM a.users JOIN b.events USING (user_id);

Each attached database can be DuckDB-native or backed by a storage extension (e.g., ATTACH 'host=...' AS pg (TYPE postgres)). A MetaTransaction (systems/transaction) coordinates writes across attached databases, so a BEGIN; INSERT INTO a.t ...; INSERT INTO b.t ...; COMMIT; is atomic across databases (best-effort across heterogeneous backends).

Encryption at rest

Storage encryption is handled by the encryption helpers in src/common/crypto/ and src/common/encryption_*.cpp. Encrypted databases use AES-GCM with a key supplied at open time:

ATTACH 'encrypted.db' AS enc (ENCRYPTION_KEY 'my-secret');

Implementation details are in src/common/encryption_functions.cpp and src/common/encryption_state.cpp.

Storage compatibility

DuckDB tracks a versioned storage format (src/storage/storage_info.cpp, src/storage/version_map.json). Newer versions can always read databases written by older versions; older versions read newer formats best-effort. The matrix is exercised by .github/workflows/CrossVersion.yml and the fixtures in test/bwc/.

Where to look

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Transactions and storage – DuckDB wiki | Factory