Open-Source Wikis

/

DuckDB

/

DuckDB

duckdb/duckdb

DuckDB

DuckDB is an in-process analytical (OLAP) database. It is written in C++17 with a vectorized push-based execution engine, MVCC transactions, and a single-file storage format. The codebase is a single repository that ships the engine, a CLI shell, a C API, and a set of in-tree extensions.

What this codebase contains

  • A C++17 database engine in src/ (~340k lines, ~1,400 .cpp files) covering SQL parsing, planning, optimization, vectorized execution, MVCC transactions, storage, and a catalog.
  • In-tree extensions in extension/ for Parquet, JSON, ICU, TPC-H/TPC-DS data generators, the autocomplete engine, the core_functions library, and an optional jemalloc allocator.
  • The DuckDB shell (CLI) in tools/shell/, plus packaging stubs for Julia and Swift clients in tools/juliapkg/ and tools/swift/.
  • A test harness with the sqllogictest runner and ~4,800 .test/.test_slow files in test/sql/, plus C++ API tests in test/api/.
  • Benchmark suites (TPC-H, TPC-DS, micro-benchmarks) in benchmark/.
  • Build orchestration via Makefile + CMakeLists.txt, with code generation scripts in scripts/ (e.g., generate_c_api.py, generate_serialization.py).

Quick map

graph LR
    subgraph Clients
        CLI[CLI shell]
        CAPI[C API]
        Julia[julia / swift]
    end
    subgraph Engine
        Parser[parser]
        Planner[planner]
        Opt[optimizer]
        Exec[execution]
        Par[parallel]
        Func[function]
        Cat[catalog]
        Txn[transaction]
        Stor[storage]
        Common[common]
    end
    subgraph Extensions
        Parq[parquet]
        JSON[json]
        ICU[icu]
        TPC[tpch / tpcds]
        Core[core_functions]
    end
    Clients --> Engine
    Parser --> Planner --> Opt --> Exec
    Exec --> Par
    Engine --> Stor
    Engine --> Cat
    Engine --> Txn
    Engine --> Common
    Engine --> Func
    Extensions --> Engine

Where to start

  • New to the codebase: read getting-started, then architecture for the query lifecycle.
  • Looking for a subsystem: see systems (parser, planner, optimizer, execution, storage, transaction, catalog, function, common, main).
  • Looking for an in-tree extension (parquet, json, icu, ...): see extensions.
  • Working on the SQL surface: see features/sql-frontend.
  • Glossary of project-specific terms: glossary.

Repository conventions

  • C++17, tabs for indentation, 120-column lines.
  • Naming: snake_case.cpp filenames, PascalCase types and functions, snake_case variables and members.
  • Smart pointers only (unique_ptr, optional_ptr, reference); no raw new/delete.
  • Tests are written in sqllogictest wherever possible.
  • The project has an explicit policy that pull requests authored by LLMs are not accepted; see CONTRIBUTING.md.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

DuckDB – DuckDB wiki | Factory