Open-Source Wikis

/

DuckDB

/

Lore

duckdb/duckdb

Lore

The story of how DuckDB grew from a research prototype at CWI into a 340k-line analytical engine maintained by a global community.

Eras

The CWI prototype (Jul–Dec 2018)

DuckDB began as research at the Database Architectures Group at CWI Amsterdam, led by Mark Raasveldt and Hannes Mühleisen. The first commit, on Jul 13, 2018, is titled "Working parser + initial draft of interface". Within two weeks the codebase already had:

  • A PostgreSQL-derived parser (82f1559651, Jul 16: "partial source tree transformation of SELECT statements from Postgres representation to a custom internal C++ representation, works on TPC-H Q1.")
  • A simple catalog and binder
  • A logical plan, then a physical plan generator
  • Initial vectorized execution

The project's design choices (in-process embedding, vectorized execution, single-file storage, MVCC) were already visible in the first month. The PostgreSQL parser was eventually replaced by a hand-written PEG grammar (in src/parser/peg/), but the AST shape inherited from Postgres still echoes in SQLStatement subclasses.

Early public release (2019, v0.1.x)

v0.1.0 was tagged on Jun 27, 2019, almost exactly a year after the first commit. The README pitched DuckDB as "an in-process SQL OLAP database management system" and shipped a CLI plus Python and R bindings. The C API was already present but minimal.

Storage stabilization and ecosystem (2020-2022, v0.2 → v0.5)

  • v0.2.0Jul 23, 2020. Single-file storage format took its current shape, with row groups, compressed segments, and a checkpointing protocol.
  • v0.3.0Oct 6, 2021. JSON and Parquet became first-class extensions. The HTTP/object-store integration matured into the httpfs out-of-tree extension.
  • v0.5.0Sep 4, 2022. The optimizer pipeline became more sophisticated (cost-based join ordering, statistics propagation). The storage format was versioned and the storage_compatibility test suite was added.

During this era, DuckDB Labs was founded (2021) to support commercial users while the engine remained MIT-licensed and CWI-rooted.

Parallel execution and the modern shape (2023, v0.7 → v0.9)

  • v0.7.0Feb 13, 2023. The parallel executor (src/parallel/executor.cpp) had become the centerpiece of query execution: meta-pipelines, per-pipeline events, work-stealing tasks. Hash join and aggregate hash table both gained partitioned implementations (radix_partitioned_hashtable.cpp, ~42 KB; aggregate_hashtable.cpp, ~38 KB).
  • v0.8.0v0.9.0 — The optimizer fanned out across many small files (build_probe_side_optimizer.cpp, late_materialization.cpp, cse_optimizer.cpp, remove_unused_columns.cpp). Window functions were rewritten and Arrow integration deepened.

v1.0 and beyond (2024-2026)

  • v1.0.0May 29, 2024. The first stable release. Storage compatibility became a hard guarantee. Many out-of-tree extensions (Iceberg, Delta, AWS, Postgres scanner, Spatial) reached general availability around this time.
  • v1.1 (Sep 2024), v1.2 (Feb 2025), v1.3 (May 2025), v1.4 (Sep 2025), v1.5 (Mar 2026). Quarterly minor releases followed, each focused on optimizer improvements, additional clients, encryption, and broader file-format support.

Longest-standing features

These have been continuously present and exercised since the first releases:

  • The catalog model (src/catalog/). The basic shape — CatalogCatalogSetCatalogEntry subclasses with versioning — is recognizable from the first commits.
  • Vector and DataChunk (src/common/types/vector.cpp, data_chunk.cpp). The vectorized data model has been the engine's heart since 2018, with refinements (e.g., dictionary vectors, list/struct nesting) but no fundamental change.
  • The PostgreSQL-style AST (src/parser/statement/, src/parser/expression/). Even after the parser itself was rewritten in PEG, the AST structure stayed.
  • The amalgamation build (scripts/amalgamation.py). Building DuckDB as one big .cpp for embedding has been supported since the early days.

Major rewrites

  • Parser → PEG (2024 onward). The original parser was forked from PostgreSQL's. It was incrementally replaced by a PEG-based parser in src/parser/peg/ so that DuckDB could own its grammar. The new parser is generated from *.gram files via scripts/build_grammar.sh and feeds the same Postgres-derived AST.
  • Window functions. The window operator and supporting frame logic were rewritten multiple times over 2022-2024 to support segment trees, range frames, and exclusion clauses (src/function/window/).
  • Aggregate hash table (src/execution/aggregate_hashtable.cpp and radix_partitioned_hashtable.cpp). The hash aggregate moved to a partitioned, NUMA-aware implementation as parallelism grew.
  • Compression codecs. New codecs were added incrementally — bitpacking, dictionary, chimp, patas, alp (src/storage/compression/) — and the compression-selection logic was rebuilt to evaluate analyses per-segment.

Deprecated features

DuckDB tends to evolve in place rather than deprecate features. Visible removals include:

  • The original PostgreSQL-fork parser (now replaced by PEG; see above).
  • The historical interpreted execution path. All execution today goes through vectorized operators.
  • Older serialization formats. src/include/duckdb/storage/serialization/ is regenerated; backward compatibility for older storage versions is enforced by the bwc/ tests in test/bwc/.

Growth trajectory

xychart-beta
    title "Approximate cumulative releases since 2019"
    x-axis ["v0.1 (2019)", "v0.2 (2020)", "v0.3 (2021)", "v0.5 (2022)", "v0.7 (2023)", "v0.9 (2023)", "v1.0 (2024)", "v1.2 (2025)", "v1.4 (2025)", "v1.5 (2026)"]
    y-axis "Tagged minor releases" 1 --> 30
    line [1, 2, 3, 5, 7, 9, 12, 16, 20, 24]

The codebase has grown from ~10 contributors in 2019 to ~726 unique authors. Contribution velocity remains heavily concentrated on the founding team and DuckDB Labs engineers (Mytherin, Hannes, Tishj, Mark Raasveldt, Laurens Kuiper, Pedro Holanda, Sam Ansmink, Tom Ebergen, Richard Wesley), but external contributions are routine, especially in extensions and SQL function coverage.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Lore – DuckDB wiki | Factory