Open-Source Wikis

/

DuckDB

/

Reference

/

Data models

duckdb/duckdb

Data models

A reference page that pulls together the type system, on-disk layout, and key in-memory data structures.

Logical types

DuckDB types in src/include/duckdb/common/types.hpp:

Category Types
Boolean BOOLEAN
Signed integers TINYINT (8b), SMALLINT (16b), INTEGER (32b), BIGINT (64b), HUGEINT (128b)
Unsigned integers UTINYINT, USMALLINT, UINTEGER, UBIGINT, UHUGEINT
Floating point FLOAT (32b), DOUBLE (64b)
Decimal DECIMAL(p, s) — backed by int16/int32/int64/int128 depending on precision
Date/time DATE (32b days), TIME (64b µs), TIMESTAMP (64b µs), TIMESTAMP WITH TIME ZONE, TIMESTAMP_S/_MS/_NS, INTERVAL
String/blob VARCHAR, BLOB, BIT (bitstring), BIGNUM (arbitrary-precision int), VARINT (variable-length int)
Identifiers UUID (128b)
JSON JSON (alias of VARCHAR with validation; provided by the JSON extension)
Nested LIST<T>, ARRAY<T, N>, STRUCT<f1 T1, ...>, MAP<K, V>, UNION(name T, ...)
Enum ENUM('a', 'b', ...)
Variant VARIANT (self-describing dynamic type)

LogicalType is the runtime representation. Each type has a LogicalTypeId enum value, an optional width, optional child types (for nested), and optional aliases.

Physical types

For storage and SIMD, types are reduced to a small set of physical layouts. The mapping is in LogicalType::InternalType:

Physical type Width Logical types
INT8 1B BOOLEAN, TINYINT, UTINYINT
INT16 2B SMALLINT, USMALLINT, decimal(p<=4)
INT32 4B INTEGER, UINTEGER, DATE, decimal(p<=9)
INT64 8B BIGINT, UBIGINT, TIMESTAMP, TIME, decimal(p<=18)
INT128 16B HUGEINT, UHUGEINT, UUID, decimal(p<=38)
FLOAT 4B FLOAT
DOUBLE 8B DOUBLE
INTERVAL 16B INTERVAL
VARCHAR 16B inline + heap VARCHAR, BLOB, BIT, JSON
LIST (offset, length) + child LIST, ARRAY
STRUCT per-field child vectors STRUCT
MAP list of struct(key, value) MAP
UNION tag + struct of options UNION

In-memory layout

graph TD
    Chunk[DataChunk N rows] -->|cols| V1[Vector col0]
    Chunk --> V2[Vector col1]
    Chunk --> V3[Vector col2]
    V1 --> Buf[buffer + validity mask]
    V1 --> Aux[auxiliary buffer for VARCHAR/LIST/STRUCT]
    V1 --> Type[LogicalType]
    V1 --> Enc[VectorType: FLAT / CONSTANT / DICT / SEQUENCE / FSST]

A Vector is the unit of columnar data. Encodings are documented on features/vectorized-execution.

On-disk layout

graph TD
    DB["my.db"] --> H["Header A / Header B (alternating)"]
    DB --> FB["Free block list"]
    DB --> CAT["Catalog metadata blocks"]
    DB --> TBL["Table blocks"]
    TBL --> RG["RowGroup (122,880 rows)"]
    RG --> CD["ColumnData (one per column)"]
    CD --> SEG["ColumnSegment runs"]
    SEG --> CB["Compressed bytes (uncompressed / bitpacking / dictionary / chimp / patas / alp / fsst / rle)"]
    DB --> IDX["Index blocks (ART)"]
    DB --> WAL["my.db.wal (sibling)"]
Layer Source
File src/storage/single_file_block_manager.cpp
Block manager src/storage/single_file_block_manager.cpp
Buffer manager src/storage/standard_buffer_manager.cpp
Row group src/storage/table/row_group.cpp
Column data src/storage/table/column_data.cpp
Column segment src/storage/table/column_segment.cpp
Compression src/storage/compression/<codec>/
Statistics src/storage/statistics/
WAL src/storage/write_ahead_log.cpp
Checkpoint src/storage/checkpoint_manager.cpp

Catalog model

Catalog (per database)
├── SchemaCatalogEntry
│   ├── TableCatalogEntry
│   ├── ViewCatalogEntry
│   ├── SequenceCatalogEntry
│   ├── IndexCatalogEntry
│   ├── TypeCatalogEntry
│   ├── ScalarFunctionCatalogEntry
│   ├── AggregateFunctionCatalogEntry
│   ├── TableFunctionCatalogEntry
│   ├── PragmaFunctionCatalogEntry
│   ├── MacroCatalogEntry
│   └── ...
└── DefaultEntries (lazy-load built-ins)

Each entry is versioned. See systems/catalog.

Plan IR

SQLStatement (parser)
  ├── ParsedExpression
  └── TableRef
        ↓ Binder
BoundStatement
  ├── Expression  (typed)
  └── LogicalOperator (logical plan)
        ↓ Optimizer
LogicalOperator (optimized)
        ↓ PhysicalPlanGenerator
PhysicalOperator (physical plan)
        ↓ MetaPipeline / Pipeline
Pipeline DAG

Implementations: src/parser/, src/planner/, src/optimizer/, src/execution/, src/parallel/.

Serialization

DuckDB serializes plans (for storage and EXPLAIN JSON), prepared statement caches, and storage metadata using a generated dispatch:

  • Source-of-truth: JSON files in src/include/duckdb/storage/serialization/.
  • Generator: scripts/generate_serialization.py.
  • Output: src/storage/serialization/.
  • Binary serializer: src/common/serializer/binary_serializer.cpp, binary_deserializer.cpp.
  • JSON serializer: src/common/serializer/format_serializer.cpp, plus the JSON extension.

Where to look

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Data models – DuckDB wiki | Factory