duckdb/duckdb
Function
Active contributors: Tishj, Mark, Mytherin
Purpose
src/function/ is the function registry and overload-resolution machinery. It defines the interfaces for scalar, aggregate, table, window, and pragma functions, ships some core built-ins (the bulk of bundled functions live in the core_functions extension), and provides cast rules and type promotion.
Directory layout
src/function/
├── function.cpp Base Function class
├── function_binder.cpp Overload resolution by implicit-cast cost
├── function_set.cpp A set of overloads for one name
├── function_list.cpp The full registry of built-ins
├── register_function_list.cpp Registers built-ins at startup
├── built_in_functions.cpp Top-level registration entry
├── scalar_function.cpp Scalar function plumbing
├── aggregate_function.cpp Aggregate function plumbing
├── table_function.cpp Table function plumbing
├── pragma_function.cpp Pragma function plumbing
├── window_function.cpp Window function plumbing
├── macro_function.cpp SQL macros
├── scalar_macro_function.cpp Scalar macro expansion
├── table_macro_function.cpp Table macro expansion
├── compression_config.cpp Choose a compression codec per segment
├── copy_function.cpp COPY TO/FROM function plumbing
├── copy_blob.cpp Generic blob copy helper
├── encoding_function.cpp Encoding function plumbing (CSV reader, etc.)
├── cast_rules.cpp Implicit-cast cost matrix
├── udf_function.cpp User-defined functions via the C++ API
├── scalar/ Built-in scalar functions
├── aggregate/ Built-in aggregate functions
├── table/ Built-in table functions
├── window/ Built-in window functions
├── pragma/ Built-in pragma functions
├── cast/ Cast functions
└── variant/ VARIANT type supportKey abstractions
| Type | File | Role |
|---|---|---|
Function |
src/include/duckdb/function/function.hpp |
Base for any callable. |
ScalarFunction, AggregateFunction, TableFunction, WindowFunction, PragmaFunction |
src/include/duckdb/function/... |
Concrete callable categories with their lifecycle hooks. |
FunctionSet |
src/function/function_set.cpp |
A set of overloads for one name. |
FunctionBinder |
src/function/function_binder.cpp |
Resolves name + argument types to a specific overload by computing implicit-cast cost. |
BuiltinFunctions |
src/function/built_in_functions.cpp |
The registration entry point used at database startup. |
MacroFunction |
src/function/macro_function.cpp |
A SQL-level macro (CREATE MACRO). Subclasses for scalar and table macros. |
BoundCastInfo, CastFunction |
src/function/cast/ |
Per-source-target cast functions. Drive CAST(x AS T) and implicit promotion. |
How it works
graph TD
Reg[BuiltinFunctions::RegisterAll] -->|insert FunctionSets| Cat[Catalog]
Bind[Binder: function reference] -->|FunctionBinder::BindFunction| Resolve[Find FunctionSet]
Resolve -->|cost-based overload pick| Pick[Specific Function]
Pick -->|return BoundFunctionExpression| Plan[Logical plan]
Plan -->|PhysicalPlanGenerator| Exec[ExpressionExecutor / aggregate / table operator]Categories of functions
| Category | Cardinality | Example | Where to add |
|---|---|---|---|
| Scalar | 1 row in → 1 row out | length, upper, + |
src/function/scalar/<area>/, extension/core_functions/scalar/<area>/ |
| Aggregate | N rows in → 1 row out | sum, min, count, string_agg |
src/function/aggregate/, extension/core_functions/aggregate/<area>/ |
| Table | invocation → table | read_csv, parquet_scan, range, generate_series |
src/function/table/, extensions |
| Window | over a frame → 1 row out per input | row_number, lag, lead, nth_value |
src/function/window/ |
| Pragma | configuration / metadata | pragma table_info('t'), pragma threads = 4 |
src/function/pragma/ |
| Macro | syntactic sugar | CREATE MACRO add(a, b) AS a + b |
src/function/macro_function.cpp |
Overload resolution
FunctionBinder enumerates the FunctionSet for a name and assigns each overload an implicit-cast cost by walking argument types:
- Exact match: 0
- Implicit cast (
INT→BIGINT): small positive cost - Lossy cast (
DOUBLE→INT): high cost - No cast: rejected
The lowest-cost overload wins. Ties are broken by argument-type specificity. The cost rules are in cast_rules.cpp. If no overload matches, the binder throws a BinderException listing candidates.
Aggregate framework
Aggregate functions implement four hooks:
state_size— bytes needed for the per-group state.initialize— initialize a state to the identity.update— fold a chunk of values into the state.combine— merge two states (used for parallel aggregation).finalize— produce the result value(s) from a state.
PhysicalHashAggregate and PhysicalUngroupedAggregate consume these hooks. The framework supports distinct aggregates and ordered-set aggregates.
Table functions
A table function declares its argument schema and provides:
bind— given the call's literals, decide the output schema.init_global/init_local— set up parallel scan state.function— produce chunks.- Optionally:
cardinality,pushdown_complex_filter,projection_pushdown.
Most file-format extensions are table functions (read_csv, read_parquet, read_json, arrow_scan, …). See extensions.
Cast rules
cast_rules.cpp defines the implicit-cast cost between every pair of LogicalTypes. The actual cast bodies are in cast/:
- Numeric casts in
cast/numeric_cast.cpp. - Date/time casts in
cast/time_casts.cpp,time_cast.cpp,default_casts.cpp. - String casts in
cast/string_cast.cpp. - List/struct/map casts in
cast/nested_casts.cpp.
Custom types registered by extensions can plug in their own casts via LogicalType::SetAlias and the BoundCastInfo interface.
Registration
Built-ins register at database startup via BuiltinFunctions::RegisterAll (built_in_functions.cpp). Extensions add functions via ExtensionLoader::AddFunction (or through core_functions's registration helpers).
Integration points
- Catalog: Every function lives in the catalog as a
ScalarFunctionCatalogEntry,AggregateFunctionCatalogEntry, etc. See catalog. - Planner:
FunctionBinderis invoked byExpressionBinderwhen binding function calls (see planner). - Execution: Scalar function evaluation goes through
ExpressionExecutor. Aggregate and table functions plug into their dedicated physical operators in execution. - Storage: Compression codec selection lives in
compression_config.cppand operates over theCompressionFunctioninterface insrc/storage/compression/.
Entry points for modification
- Adding a scalar function in the engine: see existing examples in
src/function/scalar/. Typically: implement the kernel usingUnaryExecutor/BinaryExecutorand register viaBuiltinFunctions::AddFunction. - Adding many functions in a domain: prefer adding to
extension/core_functions/<area>/to keep the engine binary small. - Adding a table function: subclass
TableFunction, providebind/init/function, and register. - Adding casts for a custom type: implement
BoundCastInfoand register viaCatalog::AddCast. - Tuning overload resolution: see
cast_rules.cppandfunction_binder.cpp.
Key source files
| File | Purpose |
|---|---|
src/function/function_binder.cpp |
Overload resolution. |
src/function/built_in_functions.cpp |
Registration entry. |
src/function/cast_rules.cpp |
Implicit-cast cost matrix. |
src/function/scalar_function.cpp |
Scalar interface. |
src/function/aggregate_function.cpp |
Aggregate interface. |
src/function/table_function.cpp |
Table function interface. |
src/function/window_function.cpp |
Window function interface. |
src/function/macro_function.cpp |
SQL macros. |
For the bundled function library, see extensions/core-functions. For the SQL frontend that calls into these functions, see features/sql-frontend.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.