duckdb/duckdb
JSON extension
Active contributors: Mytherin, Laurens Kuiper, Tishj
Purpose
extension/json/ adds a JSON logical type, JSON-path operators, the read_json / read_json_auto table functions, the to_json / from_json functions, and COPY ... TO '...' (FORMAT JSON). It builds on top of the bundled yyjson parser in third_party/yyjson/.
Directory layout
extension/json/
├── json_extension.cpp Registration entry
├── json_common.cpp Shared parsing / formatting helpers
├── json_enums.cpp JSON-related enum tables
├── json_functions.cpp Top-level function registrar
├── json_reader.cpp read_json / read_json_auto / read_ndjson
├── json_scan.cpp Scan-state plumbing
├── json_multi_file_info.cpp Multi-file scan support
├── json_serializer.cpp Serialize plans to JSON (used by EXPLAIN)
├── json_deserializer.cpp Deserialize plans from JSON
├── serialize_json.cpp Bind-data serialization
├── json_functions/ Per-function implementations
└── include/ Public headersjson_functions/ contains one file per built-in JSON function, e.g. json_extract.cpp, json_array_length.cpp, json_keys.cpp, json_type.cpp, json_each.cpp, json_create.cpp, from_json.cpp, to_json.cpp.
What it provides
| Capability | Functions |
|---|---|
| Parsing & access | json, json_extract, json_extract_string, json_value, json_keys, json_type, json_array_length, json_structure |
| Construction | json_object, json_array, json_quote, json_merge_patch, to_json, array_to_json, row_to_json |
| Path | ->, ->> operators (analogous to PostgreSQL's JSON operators) |
| Reading | read_json('file', auto_detect=true), read_json_objects, read_ndjson |
| Writing | COPY ... TO 'out.ndjson' (FORMAT JSON) |
| Conversion | from_json(json_string, schema) to convert into typed columns |
A replacement scan registers '.json', '.ndjson', '.jsonl' patterns so that SELECT * FROM 'data.ndjson' works without a function call.
How it works
graph LR
File[JSON file path] -->|VFS| Reader[json_reader.cpp]
Reader -->|yyjson parse| AST[yyjson_doc]
AST -->|schema infer or supplied| Schema[Per-column LogicalType]
Schema --> Convert[AST -> Vector]
Convert --> Chunk[DataChunk]Reading
json_reader.cpp handles three input shapes:
- Newline-delimited JSON (NDJSON): one record per line.
- JSON array: one large array of records.
- JSON object stream: sequence of objects.
read_json_auto peeks at the file, infers a schema from the first N records, and re-reads the file using the inferred schema. read_json accepts an explicit columns argument when you already know the schema.
Writing
COPY ... TO 'out.json' (FORMAT JSON) uses csv_writer.cpp-style buffered writes via the file system, but emits NDJSON or arrayed JSON. Settings: array, compression, dateformat.
JSON type
The JSON logical type is a string alias whose values are validated to be parseable JSON. It is registered by json_extension.cpp via LogicalType::SetAlias("JSON") plus a custom cast.
Plan serialization
json_serializer.cpp and json_deserializer.cpp provide a JSON serializer used by EXPLAIN (FORMAT JSON) and by the plan-storage feature. They share the Serializer/Deserializer interfaces from src/common/serializer/.
Integration points
- File system: All reads and writes go through
FileSystem, so httpfs/S3 work transparently. - Multi-file:
json_multi_file_info.cppplugs intosrc/common/multi_file/for hive partitions and globs. - Function registration:
json_functions.cppregisters all functions under one extension load. - Replacement scans: Hooked in
json_extension.cpp. - Vendored parser:
third_party/yyjson/(small, header-only-ish C library).
Entry points for modification
- Adding a new JSON function: drop a file in
json_functions/, register it injson_functions.cpp. - Improving schema inference: see
json_reader.cpp's sampling pass. - Bug fixes in JSON path: shared helpers live in
json_common.cpp. - Tests:
test/sql/json/.
Key source files
| File | Purpose |
|---|---|
extension/json/json_extension.cpp |
Registration entry. |
extension/json/json_reader.cpp |
Read path. |
extension/json/json_functions.cpp |
Function registry. |
extension/json/json_common.cpp |
Parsing helpers. |
extension/json/json_serializer.cpp |
Plan serializer. |
extension/json/json_deserializer.cpp |
Plan deserializer. |
See extensions/parquet for the analogous columnar-format extension and systems/common for the multi-file / file-system primitives reused here.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.