duckdb/duckdb
Fun facts
A few things that are not strictly required to work on the codebase but are interesting to know.
The first commit is from 2018-07-13
Mark Raasveldt's first commit, ba75d81601, is titled "Working parser + initial draft of interface". Two days later, on Jul 16, the parser was already running TPC-H Q1 end-to-end. The "Q1 first" pattern — implement just enough to run a real benchmark — has been visible throughout the project's history.
The duck mascot
The project's name and logo come from Hannes Mühleisen's pet duck, Wilbur. The repo's logo/ directory ships the SVG mascot, and shell error messages (tools/shell/) sometimes refer to "the duck". The Discord server is named "DuckDB", and the welcome address for code-of-conduct issues is quack@duckdb.org.
"Friendly SQL"
Many operators that other engines lack are first-class in DuckDB: EXCLUDE in SELECT *, list comprehensions, GROUP BY ALL, trailing commas, FROM-first syntax. The README points to a curated list at https://duckdb.org/docs/current/sql/dialect/friendly_sql.html. These extensions are wired through src/parser/peg/.
The single-file database is one file (mostly)
A .duckdb database is genuinely one file. Temporary spill blocks live next to it as .tmp files when the buffer manager is over budget (see src/storage/temporary_file_manager.cpp). WAL contents are in a sibling .wal file (src/storage/write_ahead_log.cpp).
DuckDB has its own parser, but the AST still smells like Postgres
The parser was originally forked from PostgreSQL. Even after the rewrite to a PEG grammar (in src/parser/peg/), the AST classes (SelectStatement, RangeVar, JoinExpr, …) keep their Postgres-shaped structure. This is intentional: it kept downstream stages stable across the parser swap.
No LLM-generated PRs
CONTRIBUTING.md includes an explicit policy: "Please do not submit pull requests generated by AI (LLMs). Reviewing such PRs puts a considerable burden on the maintainers." This is unusual among open-source projects in 2026 and worth knowing before opening a PR.
The amalgamation build
scripts/amalgamation.py concatenates the entire engine into one duckdb.cpp plus one header. This is what powers the "drop-in single C++ file" embedding option that DuckDB has supported since the early days, and is the basis for the WASM and Wasm-derived JS builds.
The biggest single file is generated
src/common/enum_util.cpp is roughly 292 KB. It is fully generated by scripts/generate_enum_util.py from JSON specs and contains a giant switch statement per enum for EnumUtil::FromString / ToString. The code is checked in so that the build does not require Python, but editing it by hand is a mistake — make generate-files will overwrite it.
"It's just a vector"
The Vector class (src/include/duckdb/common/types/vector.hpp) is overloaded with meanings: it can be flat, constant, dictionary-encoded, or a sequence (start + increment). A surprising amount of execution code is Vector::Flatten-then-process; the dictionary and constant fast paths optimize the common cases.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.