duckdb/duckdb
Parser
Active contributors: Mytherin, Tishj, dtenwolde
Purpose
src/parser/ converts a SQL string into an unbound SQLStatement tree (AST). DuckDB uses a hand-written PEG grammar in src/parser/peg/, generated into a parser via scripts/build_grammar.sh. The AST shape is inherited from PostgreSQL — even though the parser was rewritten, downstream stages still expect the same node types.
Directory layout
src/parser/
├── parser.cpp Public entry point: Parser::ParseQuery
├── parsed_expression.cpp Base for unbound expression nodes
├── parsed_expression_iterator.cpp Visitor for expression trees
├── tableref.cpp Base for FROM-clause node types
├── query_node.cpp Base for SELECT/SET/CTE query nodes
├── result_modifier.cpp ORDER BY / LIMIT / OFFSET
├── column_definition.cpp CREATE TABLE column metadata
├── column_list.cpp Ordered set of columns with utilities
├── expression/ Concrete ParsedExpression subclasses
├── statement/ Concrete SQLStatement subclasses
├── tableref/ Concrete TableRef subclasses
├── query_node/ SELECT, SET ops, CTE, recursive CTE
├── parsed_data/ Bind-time payloads (CreateTableInfo, ...)
├── constraints/ CHECK, NOT NULL, FOREIGN KEY parse nodes
└── peg/ PEG grammar (.gram) + transformersThe PEG grammar is the source of truth for syntax. Each grammar rule has a corresponding transformer in src/parser/peg/transformer/ that converts the parse tree into AST nodes.
Key abstractions
| Type | File | Role |
|---|---|---|
Parser |
src/parser/parser.cpp |
Public entry point. Parser::ParseQuery(query) returns a list of SQLStatement. |
SQLStatement |
src/include/duckdb/parser/sql_statement.hpp |
Root of every parse tree. Subclasses include SelectStatement, InsertStatement, CreateStatement, PragmaStatement, etc. |
ParsedExpression |
src/include/duckdb/parser/parsed_expression.hpp |
Unbound expression. Subclasses: ColumnRefExpression, FunctionExpression, ConstantExpression, OperatorExpression, CaseExpression, SubqueryExpression, WindowExpression. |
TableRef |
src/include/duckdb/parser/tableref.hpp |
Source of rows. Subclasses: BaseTableRef, JoinRef, SubqueryRef, TableFunctionRef, EmptyTableRef. |
QueryNode |
src/include/duckdb/parser/query_node.hpp |
Body of a SELECT or set operation. Subclasses: SelectNode, SetOperationNode, RecursiveCTENode, CTENode. |
ResultModifier |
src/include/duckdb/parser/result_modifier.hpp |
ORDER BY / LIMIT / OFFSET / DISTINCT modifiers. |
How it works
graph LR
SQL[SQL string] --> PEG[PEG parser]
PEG --> ParseTree[Concrete parse tree]
ParseTree --> Transformer[Transformer]
Transformer --> AST[SQLStatement / ParsedExpression / TableRef / QueryNode]Parser::ParseQuery is the public entry point. It:
- Hands the SQL string to the PEG parser (compiled from
.gramfiles viatao::pegtlinthird_party/pegtl/). - Walks the resulting parse tree through the transformers in
src/parser/peg/transformer/. - Returns a vector of
SQLStatement(a query string can contain multiple statements separated by;).
The parser is stateless — it only depends on the grammar and the input string. Catalog lookups happen later, in the binder.
Adding new syntax
The README at src/parser/peg/README.md explains the workflow:
- Add a rule to the appropriate
.gramfile. - Add a transformer in
src/parser/peg/transformer/that converts the rule's parse-tree shape into AST nodes. - Run
make generate-filesto regenerate the parser sources (orscripts/build_grammar.sh). - Add tests in
test/sql/parser/to cover the new syntax.
Integration points
- The parser does not depend on the catalog, the binder, or the executor. It only uses
src/include/duckdb/parser/andsrc/include/duckdb/common/. - Its output is consumed by planner.
Binder::Bind(SQLStatement&)is the bridge. - Tests live in
test/sql/parser/for syntax tests and acrosstest/sql/for end-to-end coverage of language features.
Entry points for modification
- Adding a SQL feature: edit
src/parser/peg/*.gram, add a transformer insrc/parser/peg/transformer/, add an AST class insrc/parser/expression/,statement/, ortableref/, and updatesrc/parser/parsed_expression_iterator.cppso visitors traverse the new shape. - Adding a new statement type: subclass
SQLStatement, register it insrc/parser/statement/, and provide a binder that produces aBoundStatementinsrc/planner/binder/statement/. - Improving error messages: see
src/parser/query_error_context.cpp. Errors carry source positions used for IDE highlighting.
Key source files
| File | Purpose |
|---|---|
src/parser/parser.cpp |
Public API. |
src/parser/peg/README.md |
Grammar authoring guide. |
src/parser/parsed_expression_iterator.cpp |
Visitor for unbound expression trees. |
src/parser/keyword_helper.cpp |
Reserved keyword handling and identifier quoting. |
src/parser/qualified_name.cpp |
Schema-qualified name parsing. |
src/parser/query_error_context.cpp |
Errors with source positions. |
Continue to planner for what happens to the AST next.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.