duckdb/duckdb
Transactions and storage
What you get when you run BEGIN, COMMIT, ROLLBACK, or ATTACH in DuckDB. This page is the user-facing summary; the implementation lives in systems/transaction, systems/storage, and systems/catalog.
Single-file storage
A DuckDB database is one file. The on-disk layout:
- A header at the start (with two alternating copies for crash safety).
- A free-block list.
- Catalog metadata blocks.
- Per-table row group blocks.
- Per-index blocks.
Default block size is 256 KB, set at database creation time. The block manager is src/storage/single_file_block_manager.cpp.
A WAL (write-ahead log) lives next to the database file with a .wal suffix. It is appended on every commit and truncated on checkpoint.
graph LR
DB["my.db (single file)"] -.->|alternating headers| Header[Header A / Header B]
DB -->|free list| Free[Free blocks]
DB -->|catalog metadata| Cat[Catalog blocks]
DB -->|table data| RG[Row groups -> column segments]
WAL["my.db.wal"] -->|on commit| DBACID
DuckDB transactions are ACID under snapshot isolation. Concretely:
- Atomic. A transaction's writes either all become visible (commit) or none do (rollback). The undo buffer in
src/transaction/undo_buffer.cpprecords every change. - Consistent. Constraints (NOT NULL, CHECK, FOREIGN KEY, PRIMARY KEY, UNIQUE) are validated on commit.
- Isolated. Each transaction reads from a stable snapshot; concurrent reads never see uncommitted data; concurrent writes that conflict on the same row abort one of the transactions.
- Durable. Commits write to the WAL before they are acknowledged. After a crash, WAL replay (
src/storage/wal_replay.cpp) reapplies committed transactions.
Isolation level is snapshot isolation — DuckDB does not offer read-uncommitted, repeatable read, or serializable.
Transaction lifecycle
BEGIN; -- starts a snapshot
SELECT count(*) FROM big_table; -- reads against snapshot
INSERT INTO big_table VALUES (...); -- writes go to LocalStorage
COMMIT; -- assigns commit_id, writes WAL, installs versionsInternally:
| Step | What happens |
|---|---|
BEGIN |
DuckTransactionManager issues a snapshot_id. |
| Reads | Visible versions are those committed <= snapshot_id (or written by this transaction). |
| Writes | Stored in LocalStorage (src/storage/local_storage.cpp) and the undo buffer. |
COMMIT |
Assigns commit_id, validates constraints, writes WAL records (src/transaction/wal_write_state.cpp), installs versions. |
ROLLBACK |
Walks the undo buffer in reverse, restoring previous state. |
ABORT is a synonym for ROLLBACK. SAVEPOINT name; ... ROLLBACK TO name; lets you partially undo a transaction.
Auto-commit
Outside an explicit transaction, every statement runs in its own auto-commit transaction. Connection::Query is auto-commit by default; explicit BEGIN/COMMIT opts out.
Concurrency
- Multiple readers can run in parallel against any number of snapshots. They see consistent views of the data.
- Multiple writers are allowed; conflicts (two transactions updating the same row) are detected on commit and one aborts with
TransactionContextException: Conflict on tuple .... - Schema changes participate in the same MVCC:
CREATE TABLEin a transaction is invisible to other transactions until commit.
Multi-version tables
Each table is a sequence of row groups. Within a row group, columns store the most recent committed value, and a version chain hangs off rows that were updated or deleted. Cleanup (src/transaction/cleanup_state.cpp) prunes the chain when no live snapshot can see older versions anymore.
Checkpointing
When the WAL grows past a threshold (or on PRAGMA force_checkpoint, or on database close), CheckpointManager (src/storage/checkpoint_manager.cpp) flushes dirty blocks into the main file and truncates the WAL. Checkpointing is concurrent with reads — readers see a consistent view throughout.
Buffer manager
In-memory state is managed by StandardBufferManager (src/storage/standard_buffer_manager.cpp). A configurable budget (PRAGMA memory_limit) caps how many blocks can be pinned at once; victims are spilled to a temporary file via TemporaryFileManager (src/storage/temporary_file_manager.cpp).
Compression
Each ColumnSegment (a contiguous run of values for one column inside a row group) is independently compressed using one of:
uncompressedbitpackingdictionarychimp/patas/alp(floating point)fsst(string)rle
Compression choice is per-segment, picked by analyses in src/function/compression_config.cpp. See systems/storage.
Multi-database (ATTACH)
ATTACH 'a.db' AS a;
ATTACH 'b.db' AS b;
SELECT * FROM a.users JOIN b.events USING (user_id);Each attached database can be DuckDB-native or backed by a storage extension (e.g., ATTACH 'host=...' AS pg (TYPE postgres)). A MetaTransaction (systems/transaction) coordinates writes across attached databases, so a BEGIN; INSERT INTO a.t ...; INSERT INTO b.t ...; COMMIT; is atomic across databases (best-effort across heterogeneous backends).
Encryption at rest
Storage encryption is handled by the encryption helpers in src/common/crypto/ and src/common/encryption_*.cpp. Encrypted databases use AES-GCM with a key supplied at open time:
ATTACH 'encrypted.db' AS enc (ENCRYPTION_KEY 'my-secret');Implementation details are in src/common/encryption_functions.cpp and src/common/encryption_state.cpp.
Storage compatibility
DuckDB tracks a versioned storage format (src/storage/storage_info.cpp, src/storage/version_map.json). Newer versions can always read databases written by older versions; older versions read newer formats best-effort. The matrix is exercised by .github/workflows/CrossVersion.yml and the fixtures in test/bwc/.
Where to look
- For the on-disk layout: systems/storage.
- For the MVCC machinery: systems/transaction.
- For DDL (catalog) transactions: systems/catalog.
- For tuning: reference/configuration.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.