Open-Source Wikis

/

DuckDB

/

Systems

/

Function

duckdb/duckdb

Function

Active contributors: Tishj, Mark, Mytherin

Purpose

src/function/ is the function registry and overload-resolution machinery. It defines the interfaces for scalar, aggregate, table, window, and pragma functions, ships some core built-ins (the bulk of bundled functions live in the core_functions extension), and provides cast rules and type promotion.

Directory layout

src/function/
├── function.cpp                     Base Function class
├── function_binder.cpp              Overload resolution by implicit-cast cost
├── function_set.cpp                 A set of overloads for one name
├── function_list.cpp                The full registry of built-ins
├── register_function_list.cpp       Registers built-ins at startup
├── built_in_functions.cpp           Top-level registration entry
├── scalar_function.cpp              Scalar function plumbing
├── aggregate_function.cpp           Aggregate function plumbing
├── table_function.cpp               Table function plumbing
├── pragma_function.cpp              Pragma function plumbing
├── window_function.cpp              Window function plumbing
├── macro_function.cpp               SQL macros
├── scalar_macro_function.cpp        Scalar macro expansion
├── table_macro_function.cpp         Table macro expansion
├── compression_config.cpp           Choose a compression codec per segment
├── copy_function.cpp                COPY TO/FROM function plumbing
├── copy_blob.cpp                    Generic blob copy helper
├── encoding_function.cpp            Encoding function plumbing (CSV reader, etc.)
├── cast_rules.cpp                   Implicit-cast cost matrix
├── udf_function.cpp                 User-defined functions via the C++ API
├── scalar/                          Built-in scalar functions
├── aggregate/                       Built-in aggregate functions
├── table/                           Built-in table functions
├── window/                          Built-in window functions
├── pragma/                          Built-in pragma functions
├── cast/                            Cast functions
└── variant/                         VARIANT type support

Key abstractions

Type File Role
Function src/include/duckdb/function/function.hpp Base for any callable.
ScalarFunction, AggregateFunction, TableFunction, WindowFunction, PragmaFunction src/include/duckdb/function/... Concrete callable categories with their lifecycle hooks.
FunctionSet src/function/function_set.cpp A set of overloads for one name.
FunctionBinder src/function/function_binder.cpp Resolves name + argument types to a specific overload by computing implicit-cast cost.
BuiltinFunctions src/function/built_in_functions.cpp The registration entry point used at database startup.
MacroFunction src/function/macro_function.cpp A SQL-level macro (CREATE MACRO). Subclasses for scalar and table macros.
BoundCastInfo, CastFunction src/function/cast/ Per-source-target cast functions. Drive CAST(x AS T) and implicit promotion.

How it works

graph TD
    Reg[BuiltinFunctions::RegisterAll] -->|insert FunctionSets| Cat[Catalog]
    Bind[Binder: function reference] -->|FunctionBinder::BindFunction| Resolve[Find FunctionSet]
    Resolve -->|cost-based overload pick| Pick[Specific Function]
    Pick -->|return BoundFunctionExpression| Plan[Logical plan]
    Plan -->|PhysicalPlanGenerator| Exec[ExpressionExecutor / aggregate / table operator]

Categories of functions

Category Cardinality Example Where to add
Scalar 1 row in → 1 row out length, upper, + src/function/scalar/<area>/, extension/core_functions/scalar/<area>/
Aggregate N rows in → 1 row out sum, min, count, string_agg src/function/aggregate/, extension/core_functions/aggregate/<area>/
Table invocation → table read_csv, parquet_scan, range, generate_series src/function/table/, extensions
Window over a frame → 1 row out per input row_number, lag, lead, nth_value src/function/window/
Pragma configuration / metadata pragma table_info('t'), pragma threads = 4 src/function/pragma/
Macro syntactic sugar CREATE MACRO add(a, b) AS a + b src/function/macro_function.cpp

Overload resolution

FunctionBinder enumerates the FunctionSet for a name and assigns each overload an implicit-cast cost by walking argument types:

  • Exact match: 0
  • Implicit cast (INTBIGINT): small positive cost
  • Lossy cast (DOUBLEINT): high cost
  • No cast: rejected

The lowest-cost overload wins. Ties are broken by argument-type specificity. The cost rules are in cast_rules.cpp. If no overload matches, the binder throws a BinderException listing candidates.

Aggregate framework

Aggregate functions implement four hooks:

  • state_size — bytes needed for the per-group state.
  • initialize — initialize a state to the identity.
  • update — fold a chunk of values into the state.
  • combine — merge two states (used for parallel aggregation).
  • finalize — produce the result value(s) from a state.

PhysicalHashAggregate and PhysicalUngroupedAggregate consume these hooks. The framework supports distinct aggregates and ordered-set aggregates.

Table functions

A table function declares its argument schema and provides:

  • bind — given the call's literals, decide the output schema.
  • init_global / init_local — set up parallel scan state.
  • function — produce chunks.
  • Optionally: cardinality, pushdown_complex_filter, projection_pushdown.

Most file-format extensions are table functions (read_csv, read_parquet, read_json, arrow_scan, …). See extensions.

Cast rules

cast_rules.cpp defines the implicit-cast cost between every pair of LogicalTypes. The actual cast bodies are in cast/:

  • Numeric casts in cast/numeric_cast.cpp.
  • Date/time casts in cast/time_casts.cpp, time_cast.cpp, default_casts.cpp.
  • String casts in cast/string_cast.cpp.
  • List/struct/map casts in cast/nested_casts.cpp.

Custom types registered by extensions can plug in their own casts via LogicalType::SetAlias and the BoundCastInfo interface.

Registration

Built-ins register at database startup via BuiltinFunctions::RegisterAll (built_in_functions.cpp). Extensions add functions via ExtensionLoader::AddFunction (or through core_functions's registration helpers).

Integration points

  • Catalog: Every function lives in the catalog as a ScalarFunctionCatalogEntry, AggregateFunctionCatalogEntry, etc. See catalog.
  • Planner: FunctionBinder is invoked by ExpressionBinder when binding function calls (see planner).
  • Execution: Scalar function evaluation goes through ExpressionExecutor. Aggregate and table functions plug into their dedicated physical operators in execution.
  • Storage: Compression codec selection lives in compression_config.cpp and operates over the CompressionFunction interface in src/storage/compression/.

Entry points for modification

  • Adding a scalar function in the engine: see existing examples in src/function/scalar/. Typically: implement the kernel using UnaryExecutor/BinaryExecutor and register via BuiltinFunctions::AddFunction.
  • Adding many functions in a domain: prefer adding to extension/core_functions/<area>/ to keep the engine binary small.
  • Adding a table function: subclass TableFunction, provide bind/init/function, and register.
  • Adding casts for a custom type: implement BoundCastInfo and register via Catalog::AddCast.
  • Tuning overload resolution: see cast_rules.cpp and function_binder.cpp.

Key source files

File Purpose
src/function/function_binder.cpp Overload resolution.
src/function/built_in_functions.cpp Registration entry.
src/function/cast_rules.cpp Implicit-cast cost matrix.
src/function/scalar_function.cpp Scalar interface.
src/function/aggregate_function.cpp Aggregate interface.
src/function/table_function.cpp Table function interface.
src/function/window_function.cpp Window function interface.
src/function/macro_function.cpp SQL macros.

For the bundled function library, see extensions/core-functions. For the SQL frontend that calls into these functions, see features/sql-frontend.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Function – DuckDB wiki | Factory