Open-Source Wikis

/

DuckDB

/

Systems

/

Parser

duckdb/duckdb

Parser

Active contributors: Mytherin, Tishj, dtenwolde

Purpose

src/parser/ converts a SQL string into an unbound SQLStatement tree (AST). DuckDB uses a hand-written PEG grammar in src/parser/peg/, generated into a parser via scripts/build_grammar.sh. The AST shape is inherited from PostgreSQL — even though the parser was rewritten, downstream stages still expect the same node types.

Directory layout

src/parser/
├── parser.cpp                  Public entry point: Parser::ParseQuery
├── parsed_expression.cpp       Base for unbound expression nodes
├── parsed_expression_iterator.cpp  Visitor for expression trees
├── tableref.cpp                Base for FROM-clause node types
├── query_node.cpp              Base for SELECT/SET/CTE query nodes
├── result_modifier.cpp         ORDER BY / LIMIT / OFFSET
├── column_definition.cpp       CREATE TABLE column metadata
├── column_list.cpp             Ordered set of columns with utilities
├── expression/                 Concrete ParsedExpression subclasses
├── statement/                  Concrete SQLStatement subclasses
├── tableref/                   Concrete TableRef subclasses
├── query_node/                 SELECT, SET ops, CTE, recursive CTE
├── parsed_data/                Bind-time payloads (CreateTableInfo, ...)
├── constraints/                CHECK, NOT NULL, FOREIGN KEY parse nodes
└── peg/                        PEG grammar (.gram) + transformers

The PEG grammar is the source of truth for syntax. Each grammar rule has a corresponding transformer in src/parser/peg/transformer/ that converts the parse tree into AST nodes.

Key abstractions

Type File Role
Parser src/parser/parser.cpp Public entry point. Parser::ParseQuery(query) returns a list of SQLStatement.
SQLStatement src/include/duckdb/parser/sql_statement.hpp Root of every parse tree. Subclasses include SelectStatement, InsertStatement, CreateStatement, PragmaStatement, etc.
ParsedExpression src/include/duckdb/parser/parsed_expression.hpp Unbound expression. Subclasses: ColumnRefExpression, FunctionExpression, ConstantExpression, OperatorExpression, CaseExpression, SubqueryExpression, WindowExpression.
TableRef src/include/duckdb/parser/tableref.hpp Source of rows. Subclasses: BaseTableRef, JoinRef, SubqueryRef, TableFunctionRef, EmptyTableRef.
QueryNode src/include/duckdb/parser/query_node.hpp Body of a SELECT or set operation. Subclasses: SelectNode, SetOperationNode, RecursiveCTENode, CTENode.
ResultModifier src/include/duckdb/parser/result_modifier.hpp ORDER BY / LIMIT / OFFSET / DISTINCT modifiers.

How it works

graph LR
    SQL[SQL string] --> PEG[PEG parser]
    PEG --> ParseTree[Concrete parse tree]
    ParseTree --> Transformer[Transformer]
    Transformer --> AST[SQLStatement / ParsedExpression / TableRef / QueryNode]

Parser::ParseQuery is the public entry point. It:

  1. Hands the SQL string to the PEG parser (compiled from .gram files via tao::pegtl in third_party/pegtl/).
  2. Walks the resulting parse tree through the transformers in src/parser/peg/transformer/.
  3. Returns a vector of SQLStatement (a query string can contain multiple statements separated by ;).

The parser is stateless — it only depends on the grammar and the input string. Catalog lookups happen later, in the binder.

Adding new syntax

The README at src/parser/peg/README.md explains the workflow:

  1. Add a rule to the appropriate .gram file.
  2. Add a transformer in src/parser/peg/transformer/ that converts the rule's parse-tree shape into AST nodes.
  3. Run make generate-files to regenerate the parser sources (or scripts/build_grammar.sh).
  4. Add tests in test/sql/parser/ to cover the new syntax.

Integration points

  • The parser does not depend on the catalog, the binder, or the executor. It only uses src/include/duckdb/parser/ and src/include/duckdb/common/.
  • Its output is consumed by planner. Binder::Bind(SQLStatement&) is the bridge.
  • Tests live in test/sql/parser/ for syntax tests and across test/sql/ for end-to-end coverage of language features.

Entry points for modification

  • Adding a SQL feature: edit src/parser/peg/*.gram, add a transformer in src/parser/peg/transformer/, add an AST class in src/parser/expression/, statement/, or tableref/, and update src/parser/parsed_expression_iterator.cpp so visitors traverse the new shape.
  • Adding a new statement type: subclass SQLStatement, register it in src/parser/statement/, and provide a binder that produces a BoundStatement in src/planner/binder/statement/.
  • Improving error messages: see src/parser/query_error_context.cpp. Errors carry source positions used for IDE highlighting.

Key source files

File Purpose
src/parser/parser.cpp Public API.
src/parser/peg/README.md Grammar authoring guide.
src/parser/parsed_expression_iterator.cpp Visitor for unbound expression trees.
src/parser/keyword_helper.cpp Reserved keyword handling and identifier quoting.
src/parser/qualified_name.cpp Schema-qualified name parsing.
src/parser/query_error_context.cpp Errors with source positions.

Continue to planner for what happens to the AST next.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Parser – DuckDB wiki | Factory