ggml-org/llama.cpp

Grammar

Active contributors: Georgi Gerganov

Grammar-constrained decoding masks the sampler so the model can only produce tokens that match a given grammar. llama.cpp ships a custom BNF dialect called GBNF ("GGML BNF") and integrates an optional Rust-based engine called llguidance as an alternative.

Purpose

Force model output to conform to a grammar (JSON, SQL, your own DSL, ...).
Convert JSON Schemas into GBNF so structured output is one CLI flag away.
Plug a grammar into the sampler chain via llama_sampler_grammar.

GBNF parser

src/llama-grammar.cpp is the in-tree GBNF engine. It compiles a GBNF source string into a stack-machine state, then on every sampling step computes the set of vocabulary tokens that can extend the current state and masks the rest.

Type	Role	File
`llama_grammar`	Compiled grammar state	`src/llama-grammar.h`
`llama_grammar_element`	One rule element (char, char range, alternation, end)	`src/llama-grammar.h`
`llama_grammar_stack`	Stack-machine snapshot	`src/llama-grammar.cpp`

Public API in include/llama.h:

llama_grammar_init / llama_grammar_init_impl
llama_grammar_accept_impl — advance state by an accepted token
llama_grammar_free
llama_sampler_init_grammar / _lazy_patterns — sampler-side wrappers

Sample grammars live in grammars/ (JSON, JSON arrays, list, chess, C-like, ...). They are short and serve as canonical examples.

GBNF lazy patterns

A "lazy" grammar only activates after a trigger pattern fires. This is used for tool-call extraction: the grammar starts permissive and clamps down only once it has seen <tool_call> (or whatever the trigger is for the model's chat template). See llama_sampler_init_grammar_lazy_patterns.

llguidance integration

vendor/llguidance/ (a Rust workspace) is an alternative grammar engine maintained by Microsoft. It supports a richer feature set (lookahead, regex, JSON Schema natively). llama.cpp can use it when built with LLAMA_LLGUIDANCE=ON.

File	Purpose
`common/llguidance.cpp`	C++ shim that constructs an llguidance grammar from a string and exposes it as a `llama_sampler`
`vendor/llguidance/`	The Rust crate (vendored)
`docs/llguidance.md`	User-facing notes on enabling and using it

The CLI accepts --grammar / --grammar-file for both engines; build flags select which one is active.

JSON Schema → GBNF

JSON Schema is the most common way users actually specify a grammar. There are two converters in the tree:

Converter	Used by	File
C++ converter	`llama-cli`, `llama-server` (`/v1/chat/completions` with `response_format`)	`common/json-schema-to-grammar.cpp`
Python converter	Standalone scripts and JSON Schema tooling	`examples/json_schema_to_grammar.py`

Both produce GBNF that src/llama-grammar.cpp can consume.

How it works

graph TD
    Source["GBNF text or JSON schema"] --> Compile[llama_grammar_init / llguidance compile]
    Compile --> Sampler[llama_sampler_grammar wraps state]
    Sampler -->|apply| Mask[mask logits not allowed by current state]
    Mask --> Pick[final sampler picks an allowed token]
    Pick --> Accept[llama_grammar_accept advances state]
    Accept --> Sampler

Tool / function calling layer

Tool calls are a structured-output use case but go a little further: the model's free-form text needs to be parsed into a structured call. The autoparser in common/chat-auto-parser*.cpp handles this on top of the PEG parser (common/peg-parser.cpp). See Chat templates and docs/function-calling.md.

Integration points

Sampler. Wraps a llama_grammar as a llama_sampler node — see Sampler.
Server. tools/server accepts grammar and json_schema request fields in /v1/chat/completions and /v1/completions.
CLI. llama-cli --grammar-file path or --grammar 'root ::= ...'.
Tools/function calling. Autoparser uses GBNF to constrain the model's output to look like a call, then PEG-parses the actual arguments out of the resulting text.

Entry points for modification

GBNF features. src/llama-grammar.cpp is self-contained — add tokens to the GBNF lexer, then teach the parser/state machine. Add tests in tests/test-grammar-parser.cpp and tests/test-grammar-integration.cpp.
JSON Schema features. Edit common/json-schema-to-grammar.cpp and update tests/test-json-schema-to-grammar.cpp.
llguidance bump. vendor/llguidance/ is updated by the Rust upstream maintainers.

Tests

tests/test-grammar-parser.cpp — GBNF lexing/parsing.
tests/test-grammar-integration.cpp — end-to-end sampling with a grammar.
tests/test-grammar-llguidance.cpp — llguidance integration smoke tests (compiled when enabled).
tests/test-json-schema-to-grammar.cpp — schema conversion correctness.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.