duckdb/duckdb
Glossary
Project-specific terms used throughout the codebase.
Core types
DataChunk(src/include/duckdb/common/types/data_chunk.hpp): A row ofVectorcolumns sharing a single cardinality. The unit of data flow between operators.Vector(src/include/duckdb/common/types/vector.hpp): A typed columnar buffer up toSTANDARD_VECTOR_SIZElong, with a validity mask and an optional encoding (flat, constant, dictionary, sequence).Value(src/include/duckdb/common/types/value.hpp): A heap-allocated single value used at API boundaries; not used inside hot execution paths.LogicalType: Type metadata (id + width + child types) attached to aVectoror column.idx_t: The project's preferred unsigned 64-bit integer for offsets, indices, and counts. Always prefer it oversize_t.STANDARD_VECTOR_SIZE: The max number of rows in a vector. Default 2048; configurable at compile time.
SQL frontend
SQLStatement: Root of an unbound parse tree; subclasses includeSelectStatement,InsertStatement, etc. (src/parser/statement/).ParsedExpression: Unbound expression node from the parser (src/parser/expression/).Expression: A bound expression with type information, used after binding (src/planner/expression/).TableRef: Source of rows in aFROMclause (base table, subquery, join, table function). Lives insrc/parser/tableref/.Binder(src/planner/binder.cpp): Resolves names against the catalog and produces bound statements.BoundStatement: Output of binding — aLogicalOperatorplus result column metadata.LogicalOperator(src/planner/operator/): Operator nodes in the logical plan (LogicalProjection,LogicalFilter,LogicalJoin, etc.).PhysicalOperator(src/execution/operator/): Operator nodes in the physical plan, with state classes for parallel execution.ColumnBinding(src/include/duckdb/planner/column_binding.hpp): A(table_index, column_index)pair that uniquely identifies a column across the plan.
Execution
- Pipeline: A linear chain of physical operators from a source through optional intermediate operators to a sink. Built by
MetaPipelineinsrc/parallel/meta_pipeline.cpp. - Source / sink: Endpoints of a pipeline. Sources produce chunks (e.g.,
PhysicalTableScan); sinks consume them and may block (e.g.,PhysicalHashAggregate,PhysicalHashJoinbuild side). Executor(src/parallel/executor.cpp): Owns the pipeline DAG for a query and coordinates completion.TaskScheduler(src/parallel/task_scheduler.cpp): Fixed-size worker pool that runs scheduled tasks.ClientContext(src/main/client_context.cpp): Per-connection state holding the current transaction, prepared statements, profiler, and config overrides.DatabaseInstance(src/main/database.cpp): Process-level state — buffer manager, catalog, transaction manager, configured extensions, file system.Connection(src/main/connection.cpp): User-facing handle that wraps aClientContext.
Storage and transactions
- Block manager: Maps logical blocks to file offsets in the database file. The default implementation is
SingleFileBlockManager(src/storage/single_file_block_manager.cpp). BufferManager: In-memory cache of blocks with a configurable budget; spills to a temp file when over budget. Seesrc/storage/standard_buffer_manager.cpp.- Row group: A horizontal slice of a table (default 122,880 rows). Each row group stores per-column statistics and compressed segments.
- Segment: A compressed run of values for one column inside a row group. Compression methods live in
src/storage/compression/. - WAL: Write-ahead log written by
src/storage/write_ahead_log.cppand replayed on startup bywal_replay.cpp. - Checkpoint: Process that flushes dirty data into the main file and truncates the WAL (
src/storage/checkpoint_manager.cpp). - MVCC: Multi-version concurrency control. Each transaction reads from a snapshot; updates produce new versions in
local_storageand old versions inundo_buffer. Seesrc/transaction/. - Snapshot ID / commit ID: Monotonic 64-bit numbers issued by
DuckTransactionManagerto order transactions.
Catalog
Catalog(src/catalog/catalog.cpp): Per-database registry of schemas, tables, functions, views, sequences, and types.CatalogEntry: Base class for catalog objects (SchemaCatalogEntry,TableCatalogEntry,ScalarFunctionCatalogEntry, etc.). All entries are versioned by transaction.CatalogSet(src/catalog/catalog_set.cpp): Versioned hash map that storesCatalogEntrychains and serializes them through MVCC.AttachedDatabase(src/main/attached_database.cpp): A database attached to the currentDatabaseInstance, either the in-process DuckDB or an external engine via the storage extension interface.
Functions
- Scalar function: One-row-in / one-row-out function (e.g.,
abs,length). Defined insrc/function/scalar/and thecore_functionsextension. - Aggregate function: Many-rows-in / one-row-out function (e.g.,
sum,min). Defined insrc/function/aggregate/andcore_functions/aggregate/. - Window function: Aggregate or special function evaluated over a window frame. See
src/function/window/. - Table function: Function returning a table-shaped result (e.g.,
read_csv,parquet_scan). Defined insrc/function/table/and most file-format extensions. - Pragma function: Settings and meta-queries invoked via
PRAGMA. Lives insrc/function/pragma/. FunctionBinder(src/function/function_binder.cpp): Resolves a function name + argument types to an overload via implicit cast costs.
Build / tooling
- In-tree extension: An extension whose source lives in
extension/. Linked statically by default. - Out-of-tree extension: An extension whose source lives in a separate repo. Pulled in via patches in
.github/patches/and configuration in.github/config/out_of_tree_extensions.cmake. - Amalgamation: Single-file build artefact produced by
scripts/amalgamation.pyfor embedding in third-party builds. - Sqllogictest: The line-oriented SQL test format used by
.testand.test_slowfiles intest/sql/. Runner:test/unittest. - Generated files: Files produced by
scripts/generate_*.py.make generate-filesregenerates them. Examples:src/common/enum_util.cpp,src/include/duckdb.h.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.