Factory.ai

Open-Source Wikis

/

llama.cpp

/

Systems

/

Grammar

ggml-org/llama.cpp

Grammar

Active contributors: Georgi Gerganov

Grammar-constrained decoding masks the sampler so the model can only produce tokens that match a given grammar. llama.cpp ships a custom BNF dialect called GBNF ("GGML BNF") and integrates an optional Rust-based engine called llguidance as an alternative.

Purpose

  • Force model output to conform to a grammar (JSON, SQL, your own DSL, ...).
  • Convert JSON Schemas into GBNF so structured output is one CLI flag away.
  • Plug a grammar into the sampler chain via llama_sampler_grammar.

GBNF parser

src/llama-grammar.cpp is the in-tree GBNF engine. It compiles a GBNF source string into a stack-machine state, then on every sampling step computes the set of vocabulary tokens that can extend the current state and masks the rest.

Type Role File
llama_grammar Compiled grammar state src/llama-grammar.h
llama_grammar_element One rule element (char, char range, alternation, end) src/llama-grammar.h
llama_grammar_stack Stack-machine snapshot src/llama-grammar.cpp

Public API in include/llama.h:

  • llama_grammar_init / llama_grammar_init_impl
  • llama_grammar_accept_impl — advance state by an accepted token
  • llama_grammar_free
  • llama_sampler_init_grammar / _lazy_patterns — sampler-side wrappers

Sample grammars live in grammars/ (JSON, JSON arrays, list, chess, C-like, ...). They are short and serve as canonical examples.

GBNF lazy patterns

A "lazy" grammar only activates after a trigger pattern fires. This is used for tool-call extraction: the grammar starts permissive and clamps down only once it has seen <tool_call> (or whatever the trigger is for the model's chat template). See llama_sampler_init_grammar_lazy_patterns.

llguidance integration

vendor/llguidance/ (a Rust workspace) is an alternative grammar engine maintained by Microsoft. It supports a richer feature set (lookahead, regex, JSON Schema natively). llama.cpp can use it when built with LLAMA_LLGUIDANCE=ON.

File Purpose
common/llguidance.cpp C++ shim that constructs an llguidance grammar from a string and exposes it as a llama_sampler
vendor/llguidance/ The Rust crate (vendored)
docs/llguidance.md User-facing notes on enabling and using it

The CLI accepts --grammar / --grammar-file for both engines; build flags select which one is active.

JSON Schema → GBNF

JSON Schema is the most common way users actually specify a grammar. There are two converters in the tree:

Converter Used by File
C++ converter llama-cli, llama-server (/v1/chat/completions with response_format) common/json-schema-to-grammar.cpp
Python converter Standalone scripts and JSON Schema tooling examples/json_schema_to_grammar.py

Both produce GBNF that src/llama-grammar.cpp can consume.

How it works

graph TD
    Source["GBNF text or JSON schema"] --> Compile[llama_grammar_init / llguidance compile]
    Compile --> Sampler[llama_sampler_grammar wraps state]
    Sampler -->|apply| Mask[mask logits not allowed by current state]
    Mask --> Pick[final sampler picks an allowed token]
    Pick --> Accept[llama_grammar_accept advances state]
    Accept --> Sampler

Tool / function calling layer

Tool calls are a structured-output use case but go a little further: the model's free-form text needs to be parsed into a structured call. The autoparser in common/chat-auto-parser*.cpp handles this on top of the PEG parser (common/peg-parser.cpp). See Chat templates and docs/function-calling.md.

Integration points

  • Sampler. Wraps a llama_grammar as a llama_sampler node — see Sampler.
  • Server. tools/server accepts grammar and json_schema request fields in /v1/chat/completions and /v1/completions.
  • CLI. llama-cli --grammar-file path or --grammar 'root ::= ...'.
  • Tools/function calling. Autoparser uses GBNF to constrain the model's output to look like a call, then PEG-parses the actual arguments out of the resulting text.

Entry points for modification

  • GBNF features. src/llama-grammar.cpp is self-contained — add tokens to the GBNF lexer, then teach the parser/state machine. Add tests in tests/test-grammar-parser.cpp and tests/test-grammar-integration.cpp.
  • JSON Schema features. Edit common/json-schema-to-grammar.cpp and update tests/test-json-schema-to-grammar.cpp.
  • llguidance bump. vendor/llguidance/ is updated by the Rust upstream maintainers.

Tests

  • tests/test-grammar-parser.cpp — GBNF lexing/parsing.
  • tests/test-grammar-integration.cpp — end-to-end sampling with a grammar.
  • tests/test-grammar-llguidance.cpp — llguidance integration smoke tests (compiled when enabled).
  • tests/test-json-schema-to-grammar.cpp — schema conversion correctness.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Grammar – llama.cpp wiki | Factory