duckdb/duckdb
Vectorized execution
The single most important architectural decision in DuckDB is that everything is a chunk of vectors. This page traces what that means in practice, from the data structures up through the operator interface, the expression executor, and parallel pipelines.
The unit of work: DataChunk
A DataChunk (src/include/duckdb/common/types/data_chunk.hpp) is a row of Vectors sharing a single cardinality (number of rows). Operators pass chunks to each other; the engine never allocates rows individually in hot paths.
Default cardinality cap: STANDARD_VECTOR_SIZE, currently 2048. This is small enough to fit comfortably in L2 cache and large enough to amortize per-call dispatch cost.
graph LR
Source[Source op] -->|DataChunk N rows| Filter[Filter]
Filter -->|DataChunk M <= N rows| Project[Projection]
Project -->|DataChunk M rows| Sink[Sink op]The columnar buffer: Vector
A Vector (src/include/duckdb/common/types/vector.hpp) carries:
- A
LogicalType(id + width + child types). - A buffer of values of the appropriate physical width.
- A
ValidityMaskof NULL bits. - An optional auxiliary buffer (for variable-length types like strings, lists, structs).
- A
VectorTypeindicating the encoding.
Encodings
| Encoding | When it is used |
|---|---|
FLAT_VECTOR |
One value per slot — the unconditional default. |
CONSTANT_VECTOR |
All N rows have the same value (e.g., a literal in a projection). One stored value, replicated logically. |
DICTIONARY_VECTOR |
Index buffer + child vector with the unique values. Skips repeated work when many rows share values. |
SEQUENCE_VECTOR |
Two scalars represent the whole vector as start + i * step. Used for things like range(). |
FSST_VECTOR |
Compressed strings sharing a symbol table. |
Most executor code paths can short-circuit on CONSTANT_VECTOR and avoid touching N rows entirely. Vector::Flatten upgrades any encoding to FLAT_VECTOR when an operator cannot handle the original encoding.
UnifiedVectorFormat
When you must read across encodings without flattening, UnifiedVectorFormat (src/include/duckdb/common/types/vector.hpp) gives you:
datapointersel(SelectionVectorfor dictionary-encoded vectors)validitymask
This is what BinaryExecutor and friends use under the hood.
The operator interface
Every PhysicalOperator (src/include/duckdb/execution/physical_operator.hpp) advertises one or more roles:
- Source. Produces chunks via
GetData. HasLocalSourceStateandGlobalSourceState. - Operator. Transforms chunks via
Execute. - Sink. Consumes chunks via
Sink, has aCombineandFinalizestep.
A pipeline is a chain that starts at a source, flows through zero or more intermediate operators, and ends at a sink.
SourceResultType GetData(ExecutionContext &context, DataChunk &chunk,
OperatorSourceInput &input) override;
OperatorResultType Execute(ExecutionContext &context, DataChunk &input,
DataChunk &chunk, GlobalOperatorState &gstate,
OperatorState &state) override;
SinkResultType Sink(ExecutionContext &context, DataChunk &chunk,
OperatorSinkInput &input) override;Operators return OperatorResultType::HAVE_MORE_OUTPUT when they have more chunks ready (e.g., a filter that produced two output chunks from one input chunk).
The expression executor
ExpressionExecutor (src/execution/expression_executor.cpp) evaluates a vector of bound Expressions over a DataChunk and produces an output DataChunk. Each expression has per-thread scratch state; intermediate vectors are reused across chunks.
Per-class dispatch lives in src/execution/expression_executor/:
BoundFunctionExpression→execute_function.cppBoundCastExpression→execute_cast.cppBoundComparisonExpression→execute_comparison.cppBoundConjunctionExpression→execute_conjunction.cppBoundCaseExpression→execute_case.cpp
Templated executors
For scalar function authors, src/common/vector_operations/ provides:
UnaryExecutor::Execute<TA, TR>(in, out, count, kernel)— one input → one output.BinaryExecutor::Execute<TA, TB, TR>(left, right, out, count, kernel)— two inputs → one output.TernaryExecutorandGenericExecutor— three or more inputs.
These templates handle:
- Constant/dictionary fast paths.
- Validity propagation (NULL in any input → NULL output, unless your kernel says otherwise).
- Per-row dispatch in flat mode.
- Selection vectors for dictionary inputs.
Most of the scalar functions in extension/core_functions/scalar/ use these templates rather than handwriting their own loops.
Aggregate execution
Aggregates plug into a four-method interface:
state_size()— bytes per group.initialize(state)— set group state to identity.update(state, chunk)— fold a chunk of inputs into the state.combine(left, right)— merge two states.finalize(state, output)— produce result vector(s).
The hash aggregate (src/execution/operator/aggregate/physical_hash_aggregate.cpp) and the partitioned hash aggregate (src/execution/radix_partitioned_hashtable.cpp) call into these methods as chunks arrive.
Parallelism
Pipelines may be parallelized by partitioning the source. Each parallel worker has its own LocalSourceState and LocalSinkState. After the source is exhausted, sinks merge their per-thread state via Combine.
sequenceDiagram
participant W1 as Worker 1 (LocalSinkState)
participant W2 as Worker 2 (LocalSinkState)
participant Sink as Sink (GlobalSinkState)
W1->>W1: Sink chunks into local state
W2->>W2: Sink chunks into local state
W1->>Sink: Combine(local) -> global
W2->>Sink: Combine(local) -> global
Sink->>Sink: Finalize() -> ready for downstream pipelinePipelines that depend on a sink's Finalize event do not start until that finalize runs (see systems/parallel).
Why this works
The vectorized model gets several wins at once:
- Cache locality. Each operator processes one column at a time within a chunk; data stays hot in L1/L2.
- SIMD opportunities. Inner loops over
int32_t[]ordouble[]are auto-vectorized by modern compilers. - Encoding fast paths. Constant/dictionary inputs short-circuit without touching every row.
- Predictable allocation. A pipeline reuses the same
DataChunkacross iterations; the only new allocations are for spillable structures (hash tables, sort runs). - Pipeline parallelism. Chunks flow through operators back-to-back without intermediate materialization.
Where to look
- The vector primitive: systems/common.
- Operators: systems/execution.
- Pipelines and the scheduler: systems/parallel.
- Function machinery: systems/function.
- Patterns when you write your own operator: how-to-contribute/patterns-and-conventions.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.