Open-Source Wikis

/

DuckDB

/

Extensions

/

JSON extension

duckdb/duckdb

JSON extension

Active contributors: Mytherin, Laurens Kuiper, Tishj

Purpose

extension/json/ adds a JSON logical type, JSON-path operators, the read_json / read_json_auto table functions, the to_json / from_json functions, and COPY ... TO '...' (FORMAT JSON). It builds on top of the bundled yyjson parser in third_party/yyjson/.

Directory layout

extension/json/
├── json_extension.cpp           Registration entry
├── json_common.cpp              Shared parsing / formatting helpers
├── json_enums.cpp               JSON-related enum tables
├── json_functions.cpp           Top-level function registrar
├── json_reader.cpp              read_json / read_json_auto / read_ndjson
├── json_scan.cpp                Scan-state plumbing
├── json_multi_file_info.cpp     Multi-file scan support
├── json_serializer.cpp          Serialize plans to JSON (used by EXPLAIN)
├── json_deserializer.cpp        Deserialize plans from JSON
├── serialize_json.cpp           Bind-data serialization
├── json_functions/              Per-function implementations
└── include/                     Public headers

json_functions/ contains one file per built-in JSON function, e.g. json_extract.cpp, json_array_length.cpp, json_keys.cpp, json_type.cpp, json_each.cpp, json_create.cpp, from_json.cpp, to_json.cpp.

What it provides

Capability Functions
Parsing & access json, json_extract, json_extract_string, json_value, json_keys, json_type, json_array_length, json_structure
Construction json_object, json_array, json_quote, json_merge_patch, to_json, array_to_json, row_to_json
Path ->, ->> operators (analogous to PostgreSQL's JSON operators)
Reading read_json('file', auto_detect=true), read_json_objects, read_ndjson
Writing COPY ... TO 'out.ndjson' (FORMAT JSON)
Conversion from_json(json_string, schema) to convert into typed columns

A replacement scan registers '.json', '.ndjson', '.jsonl' patterns so that SELECT * FROM 'data.ndjson' works without a function call.

How it works

graph LR
    File[JSON file path] -->|VFS| Reader[json_reader.cpp]
    Reader -->|yyjson parse| AST[yyjson_doc]
    AST -->|schema infer or supplied| Schema[Per-column LogicalType]
    Schema --> Convert[AST -> Vector]
    Convert --> Chunk[DataChunk]

Reading

json_reader.cpp handles three input shapes:

  • Newline-delimited JSON (NDJSON): one record per line.
  • JSON array: one large array of records.
  • JSON object stream: sequence of objects.

read_json_auto peeks at the file, infers a schema from the first N records, and re-reads the file using the inferred schema. read_json accepts an explicit columns argument when you already know the schema.

Writing

COPY ... TO 'out.json' (FORMAT JSON) uses csv_writer.cpp-style buffered writes via the file system, but emits NDJSON or arrayed JSON. Settings: array, compression, dateformat.

JSON type

The JSON logical type is a string alias whose values are validated to be parseable JSON. It is registered by json_extension.cpp via LogicalType::SetAlias("JSON") plus a custom cast.

Plan serialization

json_serializer.cpp and json_deserializer.cpp provide a JSON serializer used by EXPLAIN (FORMAT JSON) and by the plan-storage feature. They share the Serializer/Deserializer interfaces from src/common/serializer/.

Integration points

  • File system: All reads and writes go through FileSystem, so httpfs/S3 work transparently.
  • Multi-file: json_multi_file_info.cpp plugs into src/common/multi_file/ for hive partitions and globs.
  • Function registration: json_functions.cpp registers all functions under one extension load.
  • Replacement scans: Hooked in json_extension.cpp.
  • Vendored parser: third_party/yyjson/ (small, header-only-ish C library).

Entry points for modification

  • Adding a new JSON function: drop a file in json_functions/, register it in json_functions.cpp.
  • Improving schema inference: see json_reader.cpp's sampling pass.
  • Bug fixes in JSON path: shared helpers live in json_common.cpp.
  • Tests: test/sql/json/.

Key source files

File Purpose
extension/json/json_extension.cpp Registration entry.
extension/json/json_reader.cpp Read path.
extension/json/json_functions.cpp Function registry.
extension/json/json_common.cpp Parsing helpers.
extension/json/json_serializer.cpp Plan serializer.
extension/json/json_deserializer.cpp Plan deserializer.

See extensions/parquet for the analogous columnar-format extension and systems/common for the multi-file / file-system primitives reused here.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

JSON extension – DuckDB wiki | Factory