duckdb/duckdb
DuckDB
DuckDB is an in-process analytical (OLAP) database. It is written in C++17 with a vectorized push-based execution engine, MVCC transactions, and a single-file storage format. The codebase is a single repository that ships the engine, a CLI shell, a C API, and a set of in-tree extensions.
What this codebase contains
- A C++17 database engine in
src/(~340k lines, ~1,400 .cpp files) covering SQL parsing, planning, optimization, vectorized execution, MVCC transactions, storage, and a catalog. - In-tree extensions in
extension/for Parquet, JSON, ICU, TPC-H/TPC-DS data generators, the autocomplete engine, thecore_functionslibrary, and an optionaljemallocallocator. - The DuckDB shell (CLI) in
tools/shell/, plus packaging stubs for Julia and Swift clients intools/juliapkg/andtools/swift/. - A test harness with the sqllogictest runner and ~4,800
.test/.test_slowfiles intest/sql/, plus C++ API tests intest/api/. - Benchmark suites (TPC-H, TPC-DS, micro-benchmarks) in
benchmark/. - Build orchestration via
Makefile+CMakeLists.txt, with code generation scripts inscripts/(e.g.,generate_c_api.py,generate_serialization.py).
Quick map
graph LR
subgraph Clients
CLI[CLI shell]
CAPI[C API]
Julia[julia / swift]
end
subgraph Engine
Parser[parser]
Planner[planner]
Opt[optimizer]
Exec[execution]
Par[parallel]
Func[function]
Cat[catalog]
Txn[transaction]
Stor[storage]
Common[common]
end
subgraph Extensions
Parq[parquet]
JSON[json]
ICU[icu]
TPC[tpch / tpcds]
Core[core_functions]
end
Clients --> Engine
Parser --> Planner --> Opt --> Exec
Exec --> Par
Engine --> Stor
Engine --> Cat
Engine --> Txn
Engine --> Common
Engine --> Func
Extensions --> EngineWhere to start
- New to the codebase: read getting-started, then architecture for the query lifecycle.
- Looking for a subsystem: see systems (parser, planner, optimizer, execution, storage, transaction, catalog, function, common, main).
- Looking for an in-tree extension (parquet, json, icu, ...): see extensions.
- Working on the SQL surface: see features/sql-frontend.
- Glossary of project-specific terms: glossary.
Repository conventions
- C++17, tabs for indentation, 120-column lines.
- Naming:
snake_case.cppfilenames,PascalCasetypes and functions,snake_casevariables and members. - Smart pointers only (
unique_ptr,optional_ptr,reference); no rawnew/delete. - Tests are written in sqllogictest wherever possible.
- The project has an explicit policy that pull requests authored by LLMs are not accepted; see
CONTRIBUTING.md.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.