ggml-org/llama.cpp

Other tools

Single-page summary of the smaller binaries under tools/ plus the most useful demos under examples/. For per-binary CLI documentation, see each tool's README.md.

Splits a large GGUF into multiple files (or merges splits back together). Handy for distributing big models across HuggingFace 50 GB upload limits or for reducing peak memory on quantization. Source: tools/gguf-split/gguf-split.cpp. README: tools/gguf-split/README.md. The split convention is <name>-NNNNN-of-MMMMM.gguf; the llama-model-loader understands this layout natively.

`rpc-server`

A small TCP server that exposes a remote ggml_backend so a client process can offload tensors over the network. Pairs with GGML_RPC=ON builds. Source: tools/rpc/. The corresponding backend lives at ggml/src/ggml-rpc/.

`llama-tokenize`

Tokenize and detokenize on the command line — useful for verifying tokenizer behavior or converting between text and ids in scripts. Source: tools/tokenize/tokenize.cpp.

`llama-tts`

Text-to-speech driver. Wraps a small LLM that produces speech tokens, then decodes those into audio. Source: tools/tts/. Examples include the OuteTTS pipeline.

`llama-completion`

A FIM ("fill in the middle") completion harness used by the editor plugins (examples/llama.vim, llama.vscode). Source: tools/completion/.

`llama-parser`

Standalone CLI that exercises the PEG parser and autoparser. Useful when iterating on tool-call extraction. Source: tools/parser/. Maintainer: @pwilkin.

`llama-batched-bench`

Microbench dedicated to batched-decoding behavior — varies batch size, sequence count, prompt length to study batching efficiency. Source: tools/batched-bench/.

`llama-cvector-generator`

Generates control vectors (per-layer biases) from contrastive prompts. Output is a GGUF that --control-vector can apply to any context. Source: tools/cvector-generator/.

`llama-export-lora`

Merges a LoRA adapter into a base GGUF, producing a single new model file. Source: tools/export-lora/.

`llama-fit-params`

Curve-fits sampling and quantization parameters from observed data. Used by maintainers when tuning defaults. Source: tools/fit-params/.

`tools/results/`

Helper for storing and comparing benchmark output across runs. Not a binary on its own.

Notable directories under `examples/`

The examples/ tree is bigger than tools/ and covers smaller demos plus platform integrations. Highlights:

Path	Purpose
`examples/simple/`, `examples/simple-chat/`	Minimal C examples of the libllama API
`examples/simple-cmake-pkg/`	Smallest possible downstream consumer using `find_package(llama)`
`examples/llama.android/`	Android NDK app + JNI binding
`examples/llama.swiftui/`, `examples/batched.swift/`	Swift/SwiftUI iOS/macOS integration
`examples/llama.vim`	Neovim plugin for FIM completion
`examples/embedding/`, `examples/retrieval/`	Embedding-focused demos
`examples/parallel/`, `examples/passkey/`	Multi-sequence and long-context demos
`examples/eval-callback/`	Hook into per-tensor evaluation
`examples/save-load-state/`	Serialize and restore a `llama_context`
`examples/speculative/`, `examples/speculative-simple/`, `examples/lookup/`, `examples/lookahead/`	Speculative decoding demos with different drafters
`examples/diffusion/`	Discrete diffusion text generation experiments
`examples/training/`	Optimizer / training experiments on top of `ggml-opt`
`examples/sycl/`	SYCL backend smoke tests
`examples/gguf/`, `examples/gguf-hash/`	GGUF inspection utilities
`examples/model-conversion/`	End-to-end conversion + verification harness for new models
`examples/json_schema_to_grammar.py`, `examples/pydantic_models_to_grammar.py`	Python equivalents of `common/json-schema-to-grammar.cpp`

examples/deprecation-warning/ is a special case: it produces binaries like main and server that print a friendly "renamed to llama-cli / llama-server" message.

Tests under `tests/`

While not "tools" per se, several test binaries are useful as exploratory entry points:

tests/test-backend-ops — manually invoke specific ops/backends.
tests/test-tokenizer-* — tokenizer fixtures.
tests/test-chat, tests/test-chat-parser — chat templating harnesses.
tests/peg-parser/ — PEG snapshots for the autoparser.

See Testing for the full list.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.