Open-Source Wikis

/

DuckDB

/

Systems

/

Optimizer

duckdb/duckdb

Optimizer

Active contributors: Laurens Kuiper, Tmonster, Tom Ebergen

Purpose

src/optimizer/ rewrites a bound logical plan to make it cheaper to execute, without changing its result. It is invoked once per query, on the LogicalOperator tree from the planner, and produces an optimized LogicalOperator tree for the physical plan generator.

The optimizer is rule-based with cost-based join ordering. Rules run in a fixed sequence. Each rule may rewrite the plan in place; statistics propagate alongside.

Directory layout

src/optimizer/
├── optimizer.cpp                Orchestrator: applies passes in a fixed order
├── filter_combiner.cpp          Combine and simplify filter conjunctions
├── filter_pushdown.cpp          Push filters into scans/joins
├── filter_pullup.cpp            Pull filters above joins for combination
├── expression_rewriter.cpp      Rewrite rules for expressions
├── expression_heuristics.cpp    Order filter conjuncts by selectivity heuristics
├── statistics_propagator.cpp    Compute per-column stats through operators
├── cse_optimizer.cpp            Common subexpression elimination
├── common_subplan_optimizer.cpp Common subplan extraction
├── deliminator.cpp              Remove duplicate-elimination operators
├── unnest_rewriter.cpp          Rewrite UNNEST patterns
├── topn_optimizer.cpp           ORDER BY ... LIMIT N -> TopN
├── topn_window_elimination.cpp  Window-function -> TopN where possible
├── late_materialization.cpp     Defer column materialization past joins
├── projection_pullup.cpp        Pull projections above joins
├── compressed_materialization.cpp Encode strings/decimals as ints in temp results
├── remove_unused_columns.cpp    Drop columns that no operator references
├── remove_duplicate_groups.cpp  Drop redundant GROUP BY columns
├── empty_result_pullup.cpp      Short-circuit empty intermediate results
├── join_elimination.cpp         Drop joins that don't change the result
├── join_filter_pushdown_optimizer.cpp Push pre-built filters across joins
├── build_probe_side_optimizer.cpp Choose hash-join build/probe sides
├── partitioned_execution.cpp    Choose partitioned execution strategies
├── outer_join_simplification.cpp Convert outer joins to inner where possible
├── row_group_pruner.cpp         Eliminate row groups by min/max stats
├── row_number_rewriter.cpp      Rewrite row_number() over windowed scans
├── window_self_join.cpp         Self-join optimization for window-style queries
├── cte_inlining.cpp             Inline non-recursive CTEs when profitable
├── cte_filter_pusher.cpp        Push filters into CTEs
├── limit_pushdown.cpp           Push LIMIT past projections
├── sampling_pushdown.cpp        Push TABLESAMPLE down
├── regex_range_filter.cpp       Convert REGEXP to range filters where possible
├── in_clause_rewriter.cpp       Rewrite IN-list patterns
├── aggregate_function_rewriter.cpp Rewrite aggregates (e.g., AVG -> SUM/COUNT)
├── common_aggregate_optimizer.cpp Combine aggregates that share state
├── column_binding_replacer.cpp  Bulk-rename ColumnBindings
├── column_lifetime_analyzer.cpp Track column liveness for elimination
├── join_order/                  Cost-based join enumeration (DPhyp + greedy fallback)
├── pushdown/                    Per-operator filter pushdown helpers
├── pullup/                      Per-operator filter/projection pullup helpers
├── rule/                        Reusable rewrite rules for ExpressionRewriter
├── compressed_materialization/  Helpers for compressed_materialization.cpp
├── matcher/                     Pattern matchers for rules
└── statistics/                  Per-operator statistics propagation

Key abstractions

Type File Role
Optimizer src/optimizer/optimizer.cpp Top-level driver. Applies passes in a fixed sequence on a LogicalOperator.
ExpressionRewriter src/optimizer/expression_rewriter.cpp Generic engine for rewriting expressions using Rule matchers in rule/.
Rule src/include/duckdb/optimizer/rule.hpp Base for declarative rewrite rules with a matcher and an apply function.
FilterPushdown src/optimizer/filter_pushdown.cpp Pushes filters past joins, projections, and into scans.
JoinOrderOptimizer src/optimizer/join_order/join_order_optimizer.cpp Cost-based join ordering using DPhyp; falls back to greedy + heuristics for very wide queries.
StatisticsPropagator src/optimizer/statistics_propagator.cpp Computes per-column min/max/distinct/null counts through operators.
Deliminator src/optimizer/deliminator.cpp Removes LogicalDelimGet/LogicalDelimJoin produced by subquery decorrelation.

How it works

graph TD
    Plan[Logical plan from binder] --> Rewrites[Expression / filter rewrites]
    Rewrites --> Pushdown[Filter / projection pushdown]
    Pushdown --> JoinOrder[Join order optimizer]
    JoinOrder --> Stats[Statistics propagation]
    Stats --> CSE[CSE / compressed materialization]
    CSE --> Cleanup[Remove unused columns + dead branches]
    Cleanup --> Optimized[Optimized logical plan]

Optimizer::Optimize (in optimizer.cpp) is the orchestrator. It applies each pass in a fixed order. The order matters — for example, RemoveUnusedColumns runs late because earlier passes may add helper columns that get dropped at the end.

A typical SELECT goes through:

  1. ExpressionRewriter — constant folding, NULL propagation, conjunction normalization (rules under rule/).
  2. FilterPullup then FilterPushdown — combine and push filters as far down as possible.
  3. JoinOrderOptimizer — pick a left-deep tree using statistics. For very wide queries it falls back to a greedy heuristic.
  4. StatisticsPropagator — recompute per-column stats now that joins are reordered.
  5. Deliminator, CSEOptimizer, CommonSubplanOptimizer — eliminate duplicate work.
  6. RemoveUnusedColumns — drop columns no longer referenced.
  7. Local rewrites: topn_optimizer, topn_window_elimination, late_materialization, compressed_materialization, partitioned_execution, etc.

Adding a rewrite rule

The simplest rewrites use ExpressionRewriter rules:

  1. Add a class in src/optimizer/rule/ that subclasses Rule.
  2. Override Match (a pattern over Expression) and Apply (build a replacement).
  3. Register the rule in ExpressionRewriter::ExpressionRewriter (src/optimizer/expression_rewriter.cpp).

For plan-level rewrites, add a new pass in src/optimizer/, hook it into Optimizer::Optimize, and add EXPLAIN-based tests in test/sql/optimizer/.

Statistics

StatisticsPropagator derives a BaseStatistics for each column at each operator. These statistics drive:

  • Predicate pruning — a filter can be removed entirely if statistics prove it always true or false.
  • Empty result pullup — if an intermediate is provably empty, pull the empty up.
  • Join order costing — selectivity estimates feed JoinOrderOptimizer.
  • Storage scan pruningRowGroupPruner consults per-row-group statistics from src/storage/.

Stats live in src/optimizer/statistics/operator/ (per-operator propagation) and src/optimizer/statistics/expression/ (per-expression).

Integration points

  • Input: LogicalOperator from planner.
  • Output: Optimized LogicalOperator consumed by PhysicalPlanGenerator in execution.
  • Catalog: Some passes consult catalog statistics (e.g., NDV, primary keys) via Catalog::GetStatistics.
  • Tests: test/sql/optimizer/, with EXPLAIN matchers checking plan shape.

Entry points for modification

  • Adding a new rewrite: see rule/ for expression rules, or add a top-level pass like the existing topn_optimizer.cpp and register it in Optimizer::Optimize.
  • Tuning the join order optimizer: see join_order/cost_model.cpp, join_order/join_node.cpp, join_order/plan_enumerator.cpp, and join_order/relation_manager.cpp.
  • Improving statistics: per-operator propagation lives in statistics/operator/. Stats from storage are produced in src/storage/statistics/.

Key source files

File Purpose
src/optimizer/optimizer.cpp The optimizer pipeline.
src/optimizer/filter_combiner.cpp Merge and simplify filter expressions.
src/optimizer/filter_pushdown.cpp Push filters into scans and joins.
src/optimizer/expression_rewriter.cpp Apply expression-level rules.
src/optimizer/join_order/join_order_optimizer.cpp Cost-based join ordering.
src/optimizer/statistics_propagator.cpp Stats propagation through the plan.
src/optimizer/remove_unused_columns.cpp Column elimination.
src/optimizer/topn_window_elimination.cpp ROW_NUMBER() OVER ... LIMIT N rewrites.
src/optimizer/late_materialization.cpp Defer column materialization past joins.

Continue to execution for the physical plan generator.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Optimizer – DuckDB wiki | Factory