ggml-org/llama.cpp
Other tools
Single-page summary of the smaller binaries under tools/ plus the most useful demos under examples/. For per-binary CLI documentation, see each tool's README.md.
Binaries under tools/
llama-gguf-split
Splits a large GGUF into multiple files (or merges splits back together). Handy for distributing big models across HuggingFace 50 GB upload limits or for reducing peak memory on quantization. Source: tools/gguf-split/gguf-split.cpp. README: tools/gguf-split/README.md. The split convention is <name>-NNNNN-of-MMMMM.gguf; the llama-model-loader understands this layout natively.
rpc-server
A small TCP server that exposes a remote ggml_backend so a client process can offload tensors over the network. Pairs with GGML_RPC=ON builds. Source: tools/rpc/. The corresponding backend lives at ggml/src/ggml-rpc/.
llama-tokenize
Tokenize and detokenize on the command line — useful for verifying tokenizer behavior or converting between text and ids in scripts. Source: tools/tokenize/tokenize.cpp.
llama-tts
Text-to-speech driver. Wraps a small LLM that produces speech tokens, then decodes those into audio. Source: tools/tts/. Examples include the OuteTTS pipeline.
llama-completion
A FIM ("fill in the middle") completion harness used by the editor plugins (examples/llama.vim, llama.vscode). Source: tools/completion/.
llama-parser
Standalone CLI that exercises the PEG parser and autoparser. Useful when iterating on tool-call extraction. Source: tools/parser/. Maintainer: @pwilkin.
llama-batched-bench
Microbench dedicated to batched-decoding behavior — varies batch size, sequence count, prompt length to study batching efficiency. Source: tools/batched-bench/.
llama-cvector-generator
Generates control vectors (per-layer biases) from contrastive prompts. Output is a GGUF that --control-vector can apply to any context. Source: tools/cvector-generator/.
llama-export-lora
Merges a LoRA adapter into a base GGUF, producing a single new model file. Source: tools/export-lora/.
llama-fit-params
Curve-fits sampling and quantization parameters from observed data. Used by maintainers when tuning defaults. Source: tools/fit-params/.
tools/results/
Helper for storing and comparing benchmark output across runs. Not a binary on its own.
Notable directories under examples/
The examples/ tree is bigger than tools/ and covers smaller demos plus platform integrations. Highlights:
| Path | Purpose |
|---|---|
examples/simple/, examples/simple-chat/ |
Minimal C examples of the libllama API |
examples/simple-cmake-pkg/ |
Smallest possible downstream consumer using find_package(llama) |
examples/llama.android/ |
Android NDK app + JNI binding |
examples/llama.swiftui/, examples/batched.swift/ |
Swift/SwiftUI iOS/macOS integration |
examples/llama.vim |
Neovim plugin for FIM completion |
examples/embedding/, examples/retrieval/ |
Embedding-focused demos |
examples/parallel/, examples/passkey/ |
Multi-sequence and long-context demos |
examples/eval-callback/ |
Hook into per-tensor evaluation |
examples/save-load-state/ |
Serialize and restore a llama_context |
examples/speculative/, examples/speculative-simple/, examples/lookup/, examples/lookahead/ |
Speculative decoding demos with different drafters |
examples/diffusion/ |
Discrete diffusion text generation experiments |
examples/training/ |
Optimizer / training experiments on top of ggml-opt |
examples/sycl/ |
SYCL backend smoke tests |
examples/gguf/, examples/gguf-hash/ |
GGUF inspection utilities |
examples/model-conversion/ |
End-to-end conversion + verification harness for new models |
examples/json_schema_to_grammar.py, examples/pydantic_models_to_grammar.py |
Python equivalents of common/json-schema-to-grammar.cpp |
examples/deprecation-warning/ is a special case: it produces binaries like main and server that print a friendly "renamed to llama-cli / llama-server" message.
Tests under tests/
While not "tools" per se, several test binaries are useful as exploratory entry points:
tests/test-backend-ops— manually invoke specific ops/backends.tests/test-tokenizer-*— tokenizer fixtures.tests/test-chat,tests/test-chat-parser— chat templating harnesses.tests/peg-parser/— PEG snapshots for the autoparser.
See Testing for the full list.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Multimodal (mtmd)
Next
Backends