duckdb/duckdb
core_functions extension
Purpose
extension/core_functions/ is the bundled SQL function library. It is in-tree so it is part of every default DuckDB build, but it is structured as an extension so the engine binary can be slimmed down by excluding it. It contains most of the everyday SQL functions: aggregates, list/string/date/math/blob/struct/map helpers, and the lambda-function support code.
The DuckDB engine in src/function/ only contains the function infrastructure plus a small set of "wired-into-the-language" functions (casts, comparison operators, arithmetic). Almost everything else lives here.
Directory layout
extension/core_functions/
├── core_functions_extension.cpp Registration entry point
├── function_list.cpp The big registration table
├── lambda_functions.cpp Shared support for list_transform / list_filter / list_reduce
├── scalar/ Scalar functions per domain
│ ├── string/ upper, lower, like, regexp_*, levenshtein, similarity, ...
│ ├── list/ list_filter, list_transform, list_reduce, list_sort, list_aggregate, ...
│ ├── map/ map, map_keys, map_values, map_extract, ...
│ ├── struct/ struct_pack, struct_extract, struct_insert, ...
│ ├── date/ strftime, age, century, isoyear, ...
│ ├── operator/ bitwise + arithmetic + comparison helpers
│ ├── generic/ typeof, error, alias, current_setting, ...
│ ├── system/ version, current_user, hash, can_cast_implicitly, ...
│ ├── geometry/ ST_AsText, ST_GeomFromText (WKT/WKB only - real spatial in spatial extension)
│ ├── sequence/ nextval, currval
│ ├── variant/ variant_extract, variant_typeof, ...
│ └── compressed_materialization/ helpers for the optimizer
├── aggregate/ Aggregate functions per domain (sum, avg, mode, percentile, regr_*, holistic, ...)
└── include/ Public headersfunction_list.cpp is one big table that maps each function name to its implementation. To find where a function lives, search this file first.
What it provides
The number of registered functions is in the hundreds. Highlights:
| Domain | Examples |
|---|---|
| Scalar — string | upper, lower, like, regexp_matches, regexp_extract, levenshtein, jaccard, string_split, printf, format, repeat, split_part |
| Scalar — list | list_value, list_extract, list_concat, list_transform, list_filter, list_reduce, list_sort, list_aggregate, list_distinct, unnest |
| Scalar — map | map, map_keys, map_values, map_extract, map_concat |
| Scalar — struct | struct_pack, struct_extract, struct_insert, row |
| Scalar — date | strftime, strptime, age, date_part, date_trunc (basic; ICU overrides for tz-aware), to_timestamp |
| Scalar — math | pi, sin, cos, tan, pow, sqrt, log, exp, random, setseed, xor |
| Scalar — system | version, current_setting, current_database, current_schema, hash |
| Aggregate — basic | sum, avg, min, max, count, count_star, count_if, string_agg, array_agg, histogram |
| Aggregate — statistical | stddev_pop, stddev_samp, var_pop, var_samp, corr, covar_pop, covar_samp, regr_* |
| Aggregate — holistic | mode, quantile_cont, quantile_disc, median, mad |
| Aggregate — distinct | count(DISTINCT ...), approx_distinct, approx_count_distinct |
| Aggregate — bit | bit_and, bit_or, bit_xor, bitstring_agg |
For the full list, run SELECT * FROM duckdb_functions() (implemented in extension/core_functions/scalar/system/).
How registration works
graph LR
Load[ExtensionLoader::Load core_functions] --> Reg[CoreFunctionsExtension::Load]
Reg -->|RegisterScalarFunctions| FL[function_list.cpp]
Reg -->|RegisterAggregateFunctions| FL
FL --> Cat[Catalog inserts ScalarFunctionCatalogEntry / AggregateFunctionCatalogEntry]
Cat --> Bind[FunctionBinder finds them at bind time]Each scalar/aggregate function provides a *FunctionSet factory in its .cpp file. function_list.cpp is a top-level table that lists every name and points at its factory. At extension load time, the registrar iterates this list and inserts the result into the catalog.
Lambda functions
lambda_functions.cpp provides the shared support code for the list_transform / list_filter / list_reduce / array_apply family. It defines how lambda parameters bind, how captured columns flow into the inner expression, and how to vectorize the inner evaluation.
Integration points
- Function registry: Plugs into
BuiltinFunctionsvia the standard extension load path. - Optimizer:
compressed_materialization/hosts helper functions that the optimizer introduces during plan rewrites to encode strings/decimals as ints in temp results. - Catalog: All registered functions become catalog entries at startup.
Entry points for modification
- Adding a function: drop a new
.cppin the appropriatescalar/<area>/oraggregate/<area>/directory, declare a*FunctionSet GetFunctions()factory, and add it tofunction_list.cpp. - Adding a new domain: create a
scalar/<area>/directory with aCMakeLists.txt, add it toextension/core_functions/CMakeLists.txt. - Tests:
test/sql/function/<area>/matches the source layout.
Key source files
| File | Purpose |
|---|---|
extension/core_functions/core_functions_extension.cpp |
Registration entry. |
extension/core_functions/function_list.cpp |
Master function table. |
extension/core_functions/lambda_functions.cpp |
Lambda support. |
extension/core_functions/scalar/list/list_transform.cpp |
Example of a list lambda function. |
extension/core_functions/aggregate/general/sum.cpp |
Example aggregate. |
For the engine-level function infrastructure these plug into, see systems/function.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.