duckdb/duckdb
Main (embedding surface)
Active contributors: Mark, Tishj, Mytherin
Purpose
src/main/ is the embedding layer. It defines DatabaseInstance, Connection, and ClientContext; orchestrates query execution; manages prepared statements and result sets; loads and configures extensions; and exposes the C API (src/main/capi/) that all language bindings build on top of.
Directory layout
src/main/
├── database.cpp DatabaseInstance: process-level state
├── database_manager.cpp Multi-database (ATTACH) management
├── attached_database.cpp An attached database (DuckDB or extension)
├── database_path_and_type.cpp Resolve a path to a storage backend
├── database_file_path_manager.cpp Track open files
├── db_instance_cache.cpp Re-use DatabaseInstance across opens
├── connection.cpp Connection facade over ClientContext
├── connection_manager.cpp Tracks live connections
├── client_context.cpp Per-connection: txn, profiler, settings, exec
├── client_context_file_opener.cpp File-open routing
├── client_context_wrapper.cpp Wrapper used by API helpers
├── client_data.cpp Per-connection extra data
├── client_config.cpp Per-connection config overrides
├── client_verify.cpp Optimizer/serializer verification harness
├── config.cpp Engine-wide configuration
├── valid_checker.cpp Tracks fatal-error state on a connection
├── error_manager.cpp Customizable user-facing error formatting
├── extension.cpp Extension loader entry
├── extension_manager.cpp Configured extension list
├── extension_callback_manager.cpp Hook callbacks for extensions
├── extension_install_info.cpp Extension install metadata
├── extension/ Per-extension descriptors
├── prepared_statement.cpp PreparedStatement public API
├── prepared_statement_data.cpp Internal prepared-statement state
├── pending_query_result.cpp Pending result handle (streamed)
├── stream_query_result.cpp Streaming result iterator
├── materialized_query_result.cpp Fully materialized result
├── query_result.cpp Common base
├── query_profiler.cpp Query timing/operator profiling
├── profiling_info.cpp Profiling info struct
├── profiling_utils.cpp Profiling rendering
├── relation.cpp Relational API
├── relation/ Concrete Relation subclasses
├── result_set_manager.cpp Pool of active result sets
├── chunk_scan_state.cpp Streaming chunk consumption state
├── buffered_data/ Buffered streaming results
├── http/ HTTP utility (request signing, etc.)
├── secret/ Persistent secret manager (e.g., S3 creds)
├── settings/ Per-setting structs
├── user_settings.cpp User-visible settings glue
├── appender.cpp Bulk append API
└── capi/ The C APIKey abstractions
| Type | File | Role |
|---|---|---|
DatabaseInstance |
src/main/database.cpp |
Process-level singleton per database file. Owns the buffer manager, transaction manager, catalog, file system, configured extensions. |
DatabaseManager |
src/main/database_manager.cpp |
Tracks attached databases. Resolves cross-database name references. |
AttachedDatabase |
src/main/attached_database.cpp |
A database attached to a DatabaseInstance, either DuckDB-native or via a StorageExtension. |
Connection |
src/main/connection.cpp |
User-facing handle. Wraps a ClientContext. |
ClientContext |
src/main/client_context.cpp |
Per-connection state: current transaction, profiler, prepared statements, config overrides, file opener, errors. ~58 KB; the main entry point of query execution. |
DBConfig |
src/main/config.cpp |
Engine-wide configuration: memory limit, threads, allowed extensions, replacement scans, callbacks. |
PreparedStatement, PreparedStatementData |
prepared_statement*.cpp |
Reusable execution plans with parameter bindings. |
QueryResult |
query_result.cpp |
Result base class. Subclasses: MaterializedQueryResult, StreamQueryResult, PendingQueryResult. |
Appender |
src/main/appender.cpp |
High-throughput row-by-row bulk insert API. |
Relation |
src/main/relation.cpp |
Builder API for queries (used by Python/R DataFrame-style builders). |
ExtensionManager |
src/main/extension_manager.cpp |
Tracks loaded extensions and links registration into the catalog. |
SecretManager |
src/main/secret/ |
Persistent named credential store (used by httpfs, S3, etc.). |
ErrorManager |
src/main/error_manager.cpp |
Customizable error message formatting. |
How a query runs through main
sequenceDiagram
participant App as App / CLI / binding
participant Conn as Connection
participant CC as ClientContext
participant Engine as Parser/Planner/Optimizer/Exec
participant Result as QueryResult
App->>Conn: SendQuery("SELECT ...")
Conn->>CC: ExecuteInternal
CC->>Engine: Parser::ParseQuery
Engine->>CC: SQLStatement
CC->>Engine: Planner::CreatePlan
CC->>Engine: Optimizer::Optimize
CC->>Engine: PhysicalPlanGenerator
CC->>Engine: Executor::Execute
Engine->>Result: chunks
Result->>App: Materialized or streamedClientContext::ExecuteInternal is the orchestrator. It:
- Begins a transaction if one is not active (auto-commit mode).
- Parses and binds the statement.
- Optimizes and lowers to physical plan.
- Constructs an
Executorand either materializes results or returns aStreamQueryResultfor incremental consumption. - Commits or rolls back depending on success / explicit transactions.
ClientContext also owns the connection's profiler, current logger, file opener, secret manager view, and per-connection setting overrides.
C API
src/main/capi/ provides a stable C ABI used by every language binding. Key files:
| File | Purpose |
|---|---|
duckdb-c.cpp |
duckdb_open, duckdb_connect, duckdb_query. |
prepared-c.cpp |
Prepared statements. |
result-c.cpp, data_chunk-c.cpp, value-c.cpp |
Result iteration in chunked or row form. |
arrow-c.cpp |
Arrow C Data Interface integration. |
appender-c.cpp |
C bindings for the appender. |
table_function-c.cpp, scalar_function-c.cpp, aggregate_function-c.cpp |
Register UDFs from C. |
replacement_scan-c.cpp |
Hook for binding read_xxx-style functions to native names. |
logging-c.cpp |
Structured logging from C. |
config-c.cpp, config_options-c.cpp |
Configure a database before opening. |
helper-c.cpp |
Result/value/blob helpers. |
file_system-c.cpp |
File system extensions in C. |
The C API is generated by scripts/generate_c_api.py from JSON specifications in src/include/duckdb/main/capi/. The header src/include/duckdb.h is checked in for direct embedding.
Configuration
DBConfig (src/main/config.cpp) is a large struct with:
- Memory limits (
memory_limit,temp_directory,temp_file_compression). - Threading (
threads,enable_thread_pool). - Storage (
access_mode,default_block_size,wal_autocheckpoint,force_compression). - Behavior (
autoload_known_extensions,allow_unsigned_extensions,errors_as_json). - Hooks (replacement scans, settings callbacks, optimizer extensions).
User-visible settings flow through src/main/settings/ and src/main/user_settings.cpp. The full list is auto-generated from src/common/settings.json via scripts/generate_settings.py.
Extensions
extension.cpp and extension_manager.cpp load shared-library extensions and statically-linked extensions. Extension descriptors in extension/ register functions, types, and replacement scans. See extensions.
Secrets
src/main/secret/ lets users register named credentials (e.g., CREATE SECRET my_s3 (TYPE S3, KEY_ID '...', SECRET '...')). Storage extensions like httpfs consult the secret manager to resolve credentials transparently.
Replacement scans
A replacement scan converts an unrecognized table name into a table-function call. For example, SELECT * FROM 'data.parquet' is rewritten to parquet_scan('data.parquet') by the parquet extension's replacement scan. Replacement-scan registration is in client_config.cpp and extension_callback_manager.cpp.
Integration points
- Embedding: Everything in
main/is what an embedder calls. The CLI shell (tools/shell/), the C API, and out-of-process clients all go throughConnection/ClientContext. - Engine:
ClientContextcalls into parser, planner, optimizer, execution, parallel. - Catalog & transaction:
ClientContextkeeps the activeMetaTransactionand resolves catalog lookups via the attachedDatabaseInstance.
Entry points for modification
- Adding a new public C function: edit the JSON spec in
src/include/duckdb/main/capi/, regenerate withmake generate-files, implement the body in the matching*-c.cpp. - Adding a setting: add an entry to
src/common/settings.jsonand define the setter/getter insrc/main/settings/(regenerate withmake generate-files). - Hooking an extension: implement an
Extensionsubclass, expose a*_extension_initC entry point, register your functions/types viaExtensionLoader. - Adding a replacement scan: see
client_config.cppand the parquet/json extension registration.
Key source files
| File | Purpose |
|---|---|
src/main/database.cpp |
DatabaseInstance. |
src/main/client_context.cpp |
Query orchestration. |
src/main/connection.cpp |
User-facing handle. |
src/main/config.cpp |
Engine configuration. |
src/main/extension.cpp |
Extension loader. |
src/main/capi/duckdb-c.cpp |
C API entry. |
src/main/appender.cpp |
Bulk append API. |
src/main/relation.cpp |
Relational builder API. |
For the CLI built on top of this layer, see tools/cli-shell. For the in-tree extensions, see extensions.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.