duckdb/duckdb
Optimizer
Active contributors: Laurens Kuiper, Tmonster, Tom Ebergen
Purpose
src/optimizer/ rewrites a bound logical plan to make it cheaper to execute, without changing its result. It is invoked once per query, on the LogicalOperator tree from the planner, and produces an optimized LogicalOperator tree for the physical plan generator.
The optimizer is rule-based with cost-based join ordering. Rules run in a fixed sequence. Each rule may rewrite the plan in place; statistics propagate alongside.
Directory layout
src/optimizer/
├── optimizer.cpp Orchestrator: applies passes in a fixed order
├── filter_combiner.cpp Combine and simplify filter conjunctions
├── filter_pushdown.cpp Push filters into scans/joins
├── filter_pullup.cpp Pull filters above joins for combination
├── expression_rewriter.cpp Rewrite rules for expressions
├── expression_heuristics.cpp Order filter conjuncts by selectivity heuristics
├── statistics_propagator.cpp Compute per-column stats through operators
├── cse_optimizer.cpp Common subexpression elimination
├── common_subplan_optimizer.cpp Common subplan extraction
├── deliminator.cpp Remove duplicate-elimination operators
├── unnest_rewriter.cpp Rewrite UNNEST patterns
├── topn_optimizer.cpp ORDER BY ... LIMIT N -> TopN
├── topn_window_elimination.cpp Window-function -> TopN where possible
├── late_materialization.cpp Defer column materialization past joins
├── projection_pullup.cpp Pull projections above joins
├── compressed_materialization.cpp Encode strings/decimals as ints in temp results
├── remove_unused_columns.cpp Drop columns that no operator references
├── remove_duplicate_groups.cpp Drop redundant GROUP BY columns
├── empty_result_pullup.cpp Short-circuit empty intermediate results
├── join_elimination.cpp Drop joins that don't change the result
├── join_filter_pushdown_optimizer.cpp Push pre-built filters across joins
├── build_probe_side_optimizer.cpp Choose hash-join build/probe sides
├── partitioned_execution.cpp Choose partitioned execution strategies
├── outer_join_simplification.cpp Convert outer joins to inner where possible
├── row_group_pruner.cpp Eliminate row groups by min/max stats
├── row_number_rewriter.cpp Rewrite row_number() over windowed scans
├── window_self_join.cpp Self-join optimization for window-style queries
├── cte_inlining.cpp Inline non-recursive CTEs when profitable
├── cte_filter_pusher.cpp Push filters into CTEs
├── limit_pushdown.cpp Push LIMIT past projections
├── sampling_pushdown.cpp Push TABLESAMPLE down
├── regex_range_filter.cpp Convert REGEXP to range filters where possible
├── in_clause_rewriter.cpp Rewrite IN-list patterns
├── aggregate_function_rewriter.cpp Rewrite aggregates (e.g., AVG -> SUM/COUNT)
├── common_aggregate_optimizer.cpp Combine aggregates that share state
├── column_binding_replacer.cpp Bulk-rename ColumnBindings
├── column_lifetime_analyzer.cpp Track column liveness for elimination
├── join_order/ Cost-based join enumeration (DPhyp + greedy fallback)
├── pushdown/ Per-operator filter pushdown helpers
├── pullup/ Per-operator filter/projection pullup helpers
├── rule/ Reusable rewrite rules for ExpressionRewriter
├── compressed_materialization/ Helpers for compressed_materialization.cpp
├── matcher/ Pattern matchers for rules
└── statistics/ Per-operator statistics propagationKey abstractions
| Type | File | Role |
|---|---|---|
Optimizer |
src/optimizer/optimizer.cpp |
Top-level driver. Applies passes in a fixed sequence on a LogicalOperator. |
ExpressionRewriter |
src/optimizer/expression_rewriter.cpp |
Generic engine for rewriting expressions using Rule matchers in rule/. |
Rule |
src/include/duckdb/optimizer/rule.hpp |
Base for declarative rewrite rules with a matcher and an apply function. |
FilterPushdown |
src/optimizer/filter_pushdown.cpp |
Pushes filters past joins, projections, and into scans. |
JoinOrderOptimizer |
src/optimizer/join_order/join_order_optimizer.cpp |
Cost-based join ordering using DPhyp; falls back to greedy + heuristics for very wide queries. |
StatisticsPropagator |
src/optimizer/statistics_propagator.cpp |
Computes per-column min/max/distinct/null counts through operators. |
Deliminator |
src/optimizer/deliminator.cpp |
Removes LogicalDelimGet/LogicalDelimJoin produced by subquery decorrelation. |
How it works
graph TD
Plan[Logical plan from binder] --> Rewrites[Expression / filter rewrites]
Rewrites --> Pushdown[Filter / projection pushdown]
Pushdown --> JoinOrder[Join order optimizer]
JoinOrder --> Stats[Statistics propagation]
Stats --> CSE[CSE / compressed materialization]
CSE --> Cleanup[Remove unused columns + dead branches]
Cleanup --> Optimized[Optimized logical plan]Optimizer::Optimize (in optimizer.cpp) is the orchestrator. It applies each pass in a fixed order. The order matters — for example, RemoveUnusedColumns runs late because earlier passes may add helper columns that get dropped at the end.
A typical SELECT goes through:
ExpressionRewriter— constant folding, NULL propagation, conjunction normalization (rules underrule/).FilterPullupthenFilterPushdown— combine and push filters as far down as possible.JoinOrderOptimizer— pick a left-deep tree using statistics. For very wide queries it falls back to a greedy heuristic.StatisticsPropagator— recompute per-column stats now that joins are reordered.Deliminator,CSEOptimizer,CommonSubplanOptimizer— eliminate duplicate work.RemoveUnusedColumns— drop columns no longer referenced.- Local rewrites:
topn_optimizer,topn_window_elimination,late_materialization,compressed_materialization,partitioned_execution, etc.
Adding a rewrite rule
The simplest rewrites use ExpressionRewriter rules:
- Add a class in
src/optimizer/rule/that subclassesRule. - Override
Match(a pattern overExpression) andApply(build a replacement). - Register the rule in
ExpressionRewriter::ExpressionRewriter(src/optimizer/expression_rewriter.cpp).
For plan-level rewrites, add a new pass in src/optimizer/, hook it into Optimizer::Optimize, and add EXPLAIN-based tests in test/sql/optimizer/.
Statistics
StatisticsPropagator derives a BaseStatistics for each column at each operator. These statistics drive:
- Predicate pruning — a filter can be removed entirely if statistics prove it always true or false.
- Empty result pullup — if an intermediate is provably empty, pull the empty up.
- Join order costing — selectivity estimates feed
JoinOrderOptimizer. - Storage scan pruning —
RowGroupPrunerconsults per-row-group statistics fromsrc/storage/.
Stats live in src/optimizer/statistics/operator/ (per-operator propagation) and src/optimizer/statistics/expression/ (per-expression).
Integration points
- Input:
LogicalOperatorfrom planner. - Output: Optimized
LogicalOperatorconsumed byPhysicalPlanGeneratorin execution. - Catalog: Some passes consult catalog statistics (e.g., NDV, primary keys) via
Catalog::GetStatistics. - Tests:
test/sql/optimizer/, withEXPLAINmatchers checking plan shape.
Entry points for modification
- Adding a new rewrite: see
rule/for expression rules, or add a top-level pass like the existingtopn_optimizer.cppand register it inOptimizer::Optimize. - Tuning the join order optimizer: see
join_order/cost_model.cpp,join_order/join_node.cpp,join_order/plan_enumerator.cpp, andjoin_order/relation_manager.cpp. - Improving statistics: per-operator propagation lives in
statistics/operator/. Stats from storage are produced insrc/storage/statistics/.
Key source files
| File | Purpose |
|---|---|
src/optimizer/optimizer.cpp |
The optimizer pipeline. |
src/optimizer/filter_combiner.cpp |
Merge and simplify filter expressions. |
src/optimizer/filter_pushdown.cpp |
Push filters into scans and joins. |
src/optimizer/expression_rewriter.cpp |
Apply expression-level rules. |
src/optimizer/join_order/join_order_optimizer.cpp |
Cost-based join ordering. |
src/optimizer/statistics_propagator.cpp |
Stats propagation through the plan. |
src/optimizer/remove_unused_columns.cpp |
Column elimination. |
src/optimizer/topn_window_elimination.cpp |
ROW_NUMBER() OVER ... LIMIT N rewrites. |
src/optimizer/late_materialization.cpp |
Defer column materialization past joins. |
Continue to execution for the physical plan generator.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.