Open-Source Wikis

/

DuckDB

/

DuckDB

/

Glossary

duckdb/duckdb

Glossary

Project-specific terms used throughout the codebase.

Core types

  • DataChunk (src/include/duckdb/common/types/data_chunk.hpp): A row of Vector columns sharing a single cardinality. The unit of data flow between operators.
  • Vector (src/include/duckdb/common/types/vector.hpp): A typed columnar buffer up to STANDARD_VECTOR_SIZE long, with a validity mask and an optional encoding (flat, constant, dictionary, sequence).
  • Value (src/include/duckdb/common/types/value.hpp): A heap-allocated single value used at API boundaries; not used inside hot execution paths.
  • LogicalType: Type metadata (id + width + child types) attached to a Vector or column.
  • idx_t: The project's preferred unsigned 64-bit integer for offsets, indices, and counts. Always prefer it over size_t.
  • STANDARD_VECTOR_SIZE: The max number of rows in a vector. Default 2048; configurable at compile time.

SQL frontend

  • SQLStatement: Root of an unbound parse tree; subclasses include SelectStatement, InsertStatement, etc. (src/parser/statement/).
  • ParsedExpression: Unbound expression node from the parser (src/parser/expression/).
  • Expression: A bound expression with type information, used after binding (src/planner/expression/).
  • TableRef: Source of rows in a FROM clause (base table, subquery, join, table function). Lives in src/parser/tableref/.
  • Binder (src/planner/binder.cpp): Resolves names against the catalog and produces bound statements.
  • BoundStatement: Output of binding — a LogicalOperator plus result column metadata.
  • LogicalOperator (src/planner/operator/): Operator nodes in the logical plan (LogicalProjection, LogicalFilter, LogicalJoin, etc.).
  • PhysicalOperator (src/execution/operator/): Operator nodes in the physical plan, with state classes for parallel execution.
  • ColumnBinding (src/include/duckdb/planner/column_binding.hpp): A (table_index, column_index) pair that uniquely identifies a column across the plan.

Execution

  • Pipeline: A linear chain of physical operators from a source through optional intermediate operators to a sink. Built by MetaPipeline in src/parallel/meta_pipeline.cpp.
  • Source / sink: Endpoints of a pipeline. Sources produce chunks (e.g., PhysicalTableScan); sinks consume them and may block (e.g., PhysicalHashAggregate, PhysicalHashJoin build side).
  • Executor (src/parallel/executor.cpp): Owns the pipeline DAG for a query and coordinates completion.
  • TaskScheduler (src/parallel/task_scheduler.cpp): Fixed-size worker pool that runs scheduled tasks.
  • ClientContext (src/main/client_context.cpp): Per-connection state holding the current transaction, prepared statements, profiler, and config overrides.
  • DatabaseInstance (src/main/database.cpp): Process-level state — buffer manager, catalog, transaction manager, configured extensions, file system.
  • Connection (src/main/connection.cpp): User-facing handle that wraps a ClientContext.

Storage and transactions

  • Block manager: Maps logical blocks to file offsets in the database file. The default implementation is SingleFileBlockManager (src/storage/single_file_block_manager.cpp).
  • BufferManager: In-memory cache of blocks with a configurable budget; spills to a temp file when over budget. See src/storage/standard_buffer_manager.cpp.
  • Row group: A horizontal slice of a table (default 122,880 rows). Each row group stores per-column statistics and compressed segments.
  • Segment: A compressed run of values for one column inside a row group. Compression methods live in src/storage/compression/.
  • WAL: Write-ahead log written by src/storage/write_ahead_log.cpp and replayed on startup by wal_replay.cpp.
  • Checkpoint: Process that flushes dirty data into the main file and truncates the WAL (src/storage/checkpoint_manager.cpp).
  • MVCC: Multi-version concurrency control. Each transaction reads from a snapshot; updates produce new versions in local_storage and old versions in undo_buffer. See src/transaction/.
  • Snapshot ID / commit ID: Monotonic 64-bit numbers issued by DuckTransactionManager to order transactions.

Catalog

  • Catalog (src/catalog/catalog.cpp): Per-database registry of schemas, tables, functions, views, sequences, and types.
  • CatalogEntry: Base class for catalog objects (SchemaCatalogEntry, TableCatalogEntry, ScalarFunctionCatalogEntry, etc.). All entries are versioned by transaction.
  • CatalogSet (src/catalog/catalog_set.cpp): Versioned hash map that stores CatalogEntry chains and serializes them through MVCC.
  • AttachedDatabase (src/main/attached_database.cpp): A database attached to the current DatabaseInstance, either the in-process DuckDB or an external engine via the storage extension interface.

Functions

  • Scalar function: One-row-in / one-row-out function (e.g., abs, length). Defined in src/function/scalar/ and the core_functions extension.
  • Aggregate function: Many-rows-in / one-row-out function (e.g., sum, min). Defined in src/function/aggregate/ and core_functions/aggregate/.
  • Window function: Aggregate or special function evaluated over a window frame. See src/function/window/.
  • Table function: Function returning a table-shaped result (e.g., read_csv, parquet_scan). Defined in src/function/table/ and most file-format extensions.
  • Pragma function: Settings and meta-queries invoked via PRAGMA. Lives in src/function/pragma/.
  • FunctionBinder (src/function/function_binder.cpp): Resolves a function name + argument types to an overload via implicit cast costs.

Build / tooling

  • In-tree extension: An extension whose source lives in extension/. Linked statically by default.
  • Out-of-tree extension: An extension whose source lives in a separate repo. Pulled in via patches in .github/patches/ and configuration in .github/config/out_of_tree_extensions.cmake.
  • Amalgamation: Single-file build artefact produced by scripts/amalgamation.py for embedding in third-party builds.
  • Sqllogictest: The line-oriented SQL test format used by .test and .test_slow files in test/sql/. Runner: test/unittest.
  • Generated files: Files produced by scripts/generate_*.py. make generate-files regenerates them. Examples: src/common/enum_util.cpp, src/include/duckdb.h.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Glossary – DuckDB wiki | Factory