ggml-org/llama.cpp
Grammar
Active contributors: Georgi Gerganov
Grammar-constrained decoding masks the sampler so the model can only produce tokens that match a given grammar. llama.cpp ships a custom BNF dialect called GBNF ("GGML BNF") and integrates an optional Rust-based engine called llguidance as an alternative.
Purpose
- Force model output to conform to a grammar (JSON, SQL, your own DSL, ...).
- Convert JSON Schemas into GBNF so structured output is one CLI flag away.
- Plug a grammar into the sampler chain via
llama_sampler_grammar.
GBNF parser
src/llama-grammar.cpp is the in-tree GBNF engine. It compiles a GBNF source string into a stack-machine state, then on every sampling step computes the set of vocabulary tokens that can extend the current state and masks the rest.
| Type | Role | File |
|---|---|---|
llama_grammar |
Compiled grammar state | src/llama-grammar.h |
llama_grammar_element |
One rule element (char, char range, alternation, end) | src/llama-grammar.h |
llama_grammar_stack |
Stack-machine snapshot | src/llama-grammar.cpp |
Public API in include/llama.h:
llama_grammar_init/llama_grammar_init_implllama_grammar_accept_impl— advance state by an accepted tokenllama_grammar_freellama_sampler_init_grammar/_lazy_patterns— sampler-side wrappers
Sample grammars live in grammars/ (JSON, JSON arrays, list, chess, C-like, ...). They are short and serve as canonical examples.
GBNF lazy patterns
A "lazy" grammar only activates after a trigger pattern fires. This is used for tool-call extraction: the grammar starts permissive and clamps down only once it has seen <tool_call> (or whatever the trigger is for the model's chat template). See llama_sampler_init_grammar_lazy_patterns.
llguidance integration
vendor/llguidance/ (a Rust workspace) is an alternative grammar engine maintained by Microsoft. It supports a richer feature set (lookahead, regex, JSON Schema natively). llama.cpp can use it when built with LLAMA_LLGUIDANCE=ON.
| File | Purpose |
|---|---|
common/llguidance.cpp |
C++ shim that constructs an llguidance grammar from a string and exposes it as a llama_sampler |
vendor/llguidance/ |
The Rust crate (vendored) |
docs/llguidance.md |
User-facing notes on enabling and using it |
The CLI accepts --grammar / --grammar-file for both engines; build flags select which one is active.
JSON Schema → GBNF
JSON Schema is the most common way users actually specify a grammar. There are two converters in the tree:
| Converter | Used by | File |
|---|---|---|
| C++ converter | llama-cli, llama-server (/v1/chat/completions with response_format) |
common/json-schema-to-grammar.cpp |
| Python converter | Standalone scripts and JSON Schema tooling | examples/json_schema_to_grammar.py |
Both produce GBNF that src/llama-grammar.cpp can consume.
How it works
graph TD
Source["GBNF text or JSON schema"] --> Compile[llama_grammar_init / llguidance compile]
Compile --> Sampler[llama_sampler_grammar wraps state]
Sampler -->|apply| Mask[mask logits not allowed by current state]
Mask --> Pick[final sampler picks an allowed token]
Pick --> Accept[llama_grammar_accept advances state]
Accept --> SamplerTool / function calling layer
Tool calls are a structured-output use case but go a little further: the model's free-form text needs to be parsed into a structured call. The autoparser in common/chat-auto-parser*.cpp handles this on top of the PEG parser (common/peg-parser.cpp). See Chat templates and docs/function-calling.md.
Integration points
- Sampler. Wraps a
llama_grammaras allama_samplernode — see Sampler. - Server.
tools/serveracceptsgrammarandjson_schemarequest fields in/v1/chat/completionsand/v1/completions. - CLI.
llama-cli --grammar-file pathor--grammar 'root ::= ...'. - Tools/function calling. Autoparser uses GBNF to constrain the model's output to look like a call, then PEG-parses the actual arguments out of the resulting text.
Entry points for modification
- GBNF features.
src/llama-grammar.cppis self-contained — add tokens to the GBNF lexer, then teach the parser/state machine. Add tests intests/test-grammar-parser.cppandtests/test-grammar-integration.cpp. - JSON Schema features. Edit
common/json-schema-to-grammar.cppand updatetests/test-json-schema-to-grammar.cpp. - llguidance bump.
vendor/llguidance/is updated by the Rust upstream maintainers.
Tests
tests/test-grammar-parser.cpp— GBNF lexing/parsing.tests/test-grammar-integration.cpp— end-to-end sampling with a grammar.tests/test-grammar-llguidance.cpp— llguidance integration smoke tests (compiled when enabled).tests/test-json-schema-to-grammar.cpp— schema conversion correctness.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Sampler
Next
Chat templates