Open-Source Wikis

/

DuckDB

/

Systems

/

Main (embedding surface)

duckdb/duckdb

Main (embedding surface)

Active contributors: Mark, Tishj, Mytherin

Purpose

src/main/ is the embedding layer. It defines DatabaseInstance, Connection, and ClientContext; orchestrates query execution; manages prepared statements and result sets; loads and configures extensions; and exposes the C API (src/main/capi/) that all language bindings build on top of.

Directory layout

src/main/
├── database.cpp                  DatabaseInstance: process-level state
├── database_manager.cpp          Multi-database (ATTACH) management
├── attached_database.cpp         An attached database (DuckDB or extension)
├── database_path_and_type.cpp    Resolve a path to a storage backend
├── database_file_path_manager.cpp  Track open files
├── db_instance_cache.cpp         Re-use DatabaseInstance across opens
├── connection.cpp                Connection facade over ClientContext
├── connection_manager.cpp        Tracks live connections
├── client_context.cpp            Per-connection: txn, profiler, settings, exec
├── client_context_file_opener.cpp  File-open routing
├── client_context_wrapper.cpp    Wrapper used by API helpers
├── client_data.cpp               Per-connection extra data
├── client_config.cpp             Per-connection config overrides
├── client_verify.cpp             Optimizer/serializer verification harness
├── config.cpp                    Engine-wide configuration
├── valid_checker.cpp             Tracks fatal-error state on a connection
├── error_manager.cpp             Customizable user-facing error formatting
├── extension.cpp                 Extension loader entry
├── extension_manager.cpp         Configured extension list
├── extension_callback_manager.cpp  Hook callbacks for extensions
├── extension_install_info.cpp    Extension install metadata
├── extension/                    Per-extension descriptors
├── prepared_statement.cpp        PreparedStatement public API
├── prepared_statement_data.cpp   Internal prepared-statement state
├── pending_query_result.cpp      Pending result handle (streamed)
├── stream_query_result.cpp       Streaming result iterator
├── materialized_query_result.cpp  Fully materialized result
├── query_result.cpp              Common base
├── query_profiler.cpp            Query timing/operator profiling
├── profiling_info.cpp            Profiling info struct
├── profiling_utils.cpp           Profiling rendering
├── relation.cpp                  Relational API
├── relation/                     Concrete Relation subclasses
├── result_set_manager.cpp        Pool of active result sets
├── chunk_scan_state.cpp          Streaming chunk consumption state
├── buffered_data/                Buffered streaming results
├── http/                         HTTP utility (request signing, etc.)
├── secret/                       Persistent secret manager (e.g., S3 creds)
├── settings/                     Per-setting structs
├── user_settings.cpp             User-visible settings glue
├── appender.cpp                  Bulk append API
└── capi/                         The C API

Key abstractions

Type File Role
DatabaseInstance src/main/database.cpp Process-level singleton per database file. Owns the buffer manager, transaction manager, catalog, file system, configured extensions.
DatabaseManager src/main/database_manager.cpp Tracks attached databases. Resolves cross-database name references.
AttachedDatabase src/main/attached_database.cpp A database attached to a DatabaseInstance, either DuckDB-native or via a StorageExtension.
Connection src/main/connection.cpp User-facing handle. Wraps a ClientContext.
ClientContext src/main/client_context.cpp Per-connection state: current transaction, profiler, prepared statements, config overrides, file opener, errors. ~58 KB; the main entry point of query execution.
DBConfig src/main/config.cpp Engine-wide configuration: memory limit, threads, allowed extensions, replacement scans, callbacks.
PreparedStatement, PreparedStatementData prepared_statement*.cpp Reusable execution plans with parameter bindings.
QueryResult query_result.cpp Result base class. Subclasses: MaterializedQueryResult, StreamQueryResult, PendingQueryResult.
Appender src/main/appender.cpp High-throughput row-by-row bulk insert API.
Relation src/main/relation.cpp Builder API for queries (used by Python/R DataFrame-style builders).
ExtensionManager src/main/extension_manager.cpp Tracks loaded extensions and links registration into the catalog.
SecretManager src/main/secret/ Persistent named credential store (used by httpfs, S3, etc.).
ErrorManager src/main/error_manager.cpp Customizable error message formatting.

How a query runs through main

sequenceDiagram
    participant App as App / CLI / binding
    participant Conn as Connection
    participant CC as ClientContext
    participant Engine as Parser/Planner/Optimizer/Exec
    participant Result as QueryResult
    App->>Conn: SendQuery("SELECT ...")
    Conn->>CC: ExecuteInternal
    CC->>Engine: Parser::ParseQuery
    Engine->>CC: SQLStatement
    CC->>Engine: Planner::CreatePlan
    CC->>Engine: Optimizer::Optimize
    CC->>Engine: PhysicalPlanGenerator
    CC->>Engine: Executor::Execute
    Engine->>Result: chunks
    Result->>App: Materialized or streamed

ClientContext::ExecuteInternal is the orchestrator. It:

  1. Begins a transaction if one is not active (auto-commit mode).
  2. Parses and binds the statement.
  3. Optimizes and lowers to physical plan.
  4. Constructs an Executor and either materializes results or returns a StreamQueryResult for incremental consumption.
  5. Commits or rolls back depending on success / explicit transactions.

ClientContext also owns the connection's profiler, current logger, file opener, secret manager view, and per-connection setting overrides.

C API

src/main/capi/ provides a stable C ABI used by every language binding. Key files:

File Purpose
duckdb-c.cpp duckdb_open, duckdb_connect, duckdb_query.
prepared-c.cpp Prepared statements.
result-c.cpp, data_chunk-c.cpp, value-c.cpp Result iteration in chunked or row form.
arrow-c.cpp Arrow C Data Interface integration.
appender-c.cpp C bindings for the appender.
table_function-c.cpp, scalar_function-c.cpp, aggregate_function-c.cpp Register UDFs from C.
replacement_scan-c.cpp Hook for binding read_xxx-style functions to native names.
logging-c.cpp Structured logging from C.
config-c.cpp, config_options-c.cpp Configure a database before opening.
helper-c.cpp Result/value/blob helpers.
file_system-c.cpp File system extensions in C.

The C API is generated by scripts/generate_c_api.py from JSON specifications in src/include/duckdb/main/capi/. The header src/include/duckdb.h is checked in for direct embedding.

Configuration

DBConfig (src/main/config.cpp) is a large struct with:

  • Memory limits (memory_limit, temp_directory, temp_file_compression).
  • Threading (threads, enable_thread_pool).
  • Storage (access_mode, default_block_size, wal_autocheckpoint, force_compression).
  • Behavior (autoload_known_extensions, allow_unsigned_extensions, errors_as_json).
  • Hooks (replacement scans, settings callbacks, optimizer extensions).

User-visible settings flow through src/main/settings/ and src/main/user_settings.cpp. The full list is auto-generated from src/common/settings.json via scripts/generate_settings.py.

Extensions

extension.cpp and extension_manager.cpp load shared-library extensions and statically-linked extensions. Extension descriptors in extension/ register functions, types, and replacement scans. See extensions.

Secrets

src/main/secret/ lets users register named credentials (e.g., CREATE SECRET my_s3 (TYPE S3, KEY_ID '...', SECRET '...')). Storage extensions like httpfs consult the secret manager to resolve credentials transparently.

Replacement scans

A replacement scan converts an unrecognized table name into a table-function call. For example, SELECT * FROM 'data.parquet' is rewritten to parquet_scan('data.parquet') by the parquet extension's replacement scan. Replacement-scan registration is in client_config.cpp and extension_callback_manager.cpp.

Integration points

  • Embedding: Everything in main/ is what an embedder calls. The CLI shell (tools/shell/), the C API, and out-of-process clients all go through Connection/ClientContext.
  • Engine: ClientContext calls into parser, planner, optimizer, execution, parallel.
  • Catalog & transaction: ClientContext keeps the active MetaTransaction and resolves catalog lookups via the attached DatabaseInstance.

Entry points for modification

  • Adding a new public C function: edit the JSON spec in src/include/duckdb/main/capi/, regenerate with make generate-files, implement the body in the matching *-c.cpp.
  • Adding a setting: add an entry to src/common/settings.json and define the setter/getter in src/main/settings/ (regenerate with make generate-files).
  • Hooking an extension: implement an Extension subclass, expose a *_extension_init C entry point, register your functions/types via ExtensionLoader.
  • Adding a replacement scan: see client_config.cpp and the parquet/json extension registration.

Key source files

File Purpose
src/main/database.cpp DatabaseInstance.
src/main/client_context.cpp Query orchestration.
src/main/connection.cpp User-facing handle.
src/main/config.cpp Engine configuration.
src/main/extension.cpp Extension loader.
src/main/capi/duckdb-c.cpp C API entry.
src/main/appender.cpp Bulk append API.
src/main/relation.cpp Relational builder API.

For the CLI built on top of this layer, see tools/cli-shell. For the in-tree extensions, see extensions.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Main (embedding surface) – DuckDB wiki | Factory