duckdb/duckdb
Data models
A reference page that pulls together the type system, on-disk layout, and key in-memory data structures.
Logical types
DuckDB types in src/include/duckdb/common/types.hpp:
| Category | Types |
|---|---|
| Boolean | BOOLEAN |
| Signed integers | TINYINT (8b), SMALLINT (16b), INTEGER (32b), BIGINT (64b), HUGEINT (128b) |
| Unsigned integers | UTINYINT, USMALLINT, UINTEGER, UBIGINT, UHUGEINT |
| Floating point | FLOAT (32b), DOUBLE (64b) |
| Decimal | DECIMAL(p, s) — backed by int16/int32/int64/int128 depending on precision |
| Date/time | DATE (32b days), TIME (64b µs), TIMESTAMP (64b µs), TIMESTAMP WITH TIME ZONE, TIMESTAMP_S/_MS/_NS, INTERVAL |
| String/blob | VARCHAR, BLOB, BIT (bitstring), BIGNUM (arbitrary-precision int), VARINT (variable-length int) |
| Identifiers | UUID (128b) |
| JSON | JSON (alias of VARCHAR with validation; provided by the JSON extension) |
| Nested | LIST<T>, ARRAY<T, N>, STRUCT<f1 T1, ...>, MAP<K, V>, UNION(name T, ...) |
| Enum | ENUM('a', 'b', ...) |
| Variant | VARIANT (self-describing dynamic type) |
LogicalType is the runtime representation. Each type has a LogicalTypeId enum value, an optional width, optional child types (for nested), and optional aliases.
Physical types
For storage and SIMD, types are reduced to a small set of physical layouts. The mapping is in LogicalType::InternalType:
| Physical type | Width | Logical types |
|---|---|---|
INT8 |
1B | BOOLEAN, TINYINT, UTINYINT |
INT16 |
2B | SMALLINT, USMALLINT, decimal(p<=4) |
INT32 |
4B | INTEGER, UINTEGER, DATE, decimal(p<=9) |
INT64 |
8B | BIGINT, UBIGINT, TIMESTAMP, TIME, decimal(p<=18) |
INT128 |
16B | HUGEINT, UHUGEINT, UUID, decimal(p<=38) |
FLOAT |
4B | FLOAT |
DOUBLE |
8B | DOUBLE |
INTERVAL |
16B | INTERVAL |
VARCHAR |
16B inline + heap | VARCHAR, BLOB, BIT, JSON |
LIST |
(offset, length) + child | LIST, ARRAY |
STRUCT |
per-field child vectors | STRUCT |
MAP |
list of struct(key, value) | MAP |
UNION |
tag + struct of options | UNION |
In-memory layout
graph TD
Chunk[DataChunk N rows] -->|cols| V1[Vector col0]
Chunk --> V2[Vector col1]
Chunk --> V3[Vector col2]
V1 --> Buf[buffer + validity mask]
V1 --> Aux[auxiliary buffer for VARCHAR/LIST/STRUCT]
V1 --> Type[LogicalType]
V1 --> Enc[VectorType: FLAT / CONSTANT / DICT / SEQUENCE / FSST]A Vector is the unit of columnar data. Encodings are documented on features/vectorized-execution.
On-disk layout
graph TD
DB["my.db"] --> H["Header A / Header B (alternating)"]
DB --> FB["Free block list"]
DB --> CAT["Catalog metadata blocks"]
DB --> TBL["Table blocks"]
TBL --> RG["RowGroup (122,880 rows)"]
RG --> CD["ColumnData (one per column)"]
CD --> SEG["ColumnSegment runs"]
SEG --> CB["Compressed bytes (uncompressed / bitpacking / dictionary / chimp / patas / alp / fsst / rle)"]
DB --> IDX["Index blocks (ART)"]
DB --> WAL["my.db.wal (sibling)"]| Layer | Source |
|---|---|
| File | src/storage/single_file_block_manager.cpp |
| Block manager | src/storage/single_file_block_manager.cpp |
| Buffer manager | src/storage/standard_buffer_manager.cpp |
| Row group | src/storage/table/row_group.cpp |
| Column data | src/storage/table/column_data.cpp |
| Column segment | src/storage/table/column_segment.cpp |
| Compression | src/storage/compression/<codec>/ |
| Statistics | src/storage/statistics/ |
| WAL | src/storage/write_ahead_log.cpp |
| Checkpoint | src/storage/checkpoint_manager.cpp |
Catalog model
Catalog (per database)
├── SchemaCatalogEntry
│ ├── TableCatalogEntry
│ ├── ViewCatalogEntry
│ ├── SequenceCatalogEntry
│ ├── IndexCatalogEntry
│ ├── TypeCatalogEntry
│ ├── ScalarFunctionCatalogEntry
│ ├── AggregateFunctionCatalogEntry
│ ├── TableFunctionCatalogEntry
│ ├── PragmaFunctionCatalogEntry
│ ├── MacroCatalogEntry
│ └── ...
└── DefaultEntries (lazy-load built-ins)Each entry is versioned. See systems/catalog.
Plan IR
SQLStatement (parser)
├── ParsedExpression
└── TableRef
↓ Binder
BoundStatement
├── Expression (typed)
└── LogicalOperator (logical plan)
↓ Optimizer
LogicalOperator (optimized)
↓ PhysicalPlanGenerator
PhysicalOperator (physical plan)
↓ MetaPipeline / Pipeline
Pipeline DAGImplementations: src/parser/, src/planner/, src/optimizer/, src/execution/, src/parallel/.
Serialization
DuckDB serializes plans (for storage and EXPLAIN JSON), prepared statement caches, and storage metadata using a generated dispatch:
- Source-of-truth: JSON files in
src/include/duckdb/storage/serialization/. - Generator:
scripts/generate_serialization.py. - Output:
src/storage/serialization/. - Binary serializer:
src/common/serializer/binary_serializer.cpp,binary_deserializer.cpp. - JSON serializer:
src/common/serializer/format_serializer.cpp, plus the JSON extension.
Where to look
- Type system: systems/common.
- Storage: systems/storage.
- Catalog: systems/catalog.
- Vector encodings and operator interface: features/vectorized-execution.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.