ggml-org/llama.cpp
Tools
Everything in tools/ is a standalone binary built against libllama plus common/. Each tool is a small program (typically one or two .cpp files) with its own CMakeLists.txt and README.md. Some tools also pull in extra ggml backends (e.g. llama-bench benchmarks every backend, llama-rpc-server ships its own networking).
This wiki uses the tools lens (matching the repo's own directory name) for these binaries; collectively they are the project's "applications". For the underlying library subsystems they exercise, see Systems.
Pages
- llama-cli — the primary chat / completion CLI.
- llama-server — OpenAI-compatible HTTP server with WebUI.
- llama-quantize — produce quantized GGUFs.
- llama-imatrix — produce importance-matrix files for IQ-quants.
- llama-perplexity — perplexity / KL-div / HellaSwag evaluator.
- llama-bench — throughput benchmarks.
- llama-mtmd-cli and clip — multimodal CLI plus the CLIP-based vision encoder.
- Other tools — gguf-split, gguf, batched-bench, rpc-server, tokenize, tts, completion, parser, fit-params, cvector-generator, export-lora, results.
Tool index
| Binary | Source dir | Owner (CODEOWNERS) | Brief |
|---|---|---|---|
llama-cli |
tools/cli/ |
@ngxson | Interactive chat / single-shot generation |
llama-server |
tools/server/ |
@ggml-org/llama-server | OpenAI-compatible HTTP server + WebUI |
llama-quantize |
tools/quantize/ |
@ggerganov | Quantize GGUFs |
llama-imatrix |
tools/imatrix/ |
(no listed owner) | Importance matrix generator |
llama-perplexity |
tools/perplexity/ |
@ggerganov | Perplexity / quality evaluator |
llama-bench |
tools/llama-bench/ |
(no listed owner) | Throughput benchmark |
llama-batched-bench |
tools/batched-bench/ |
@ggerganov | Batched-decoding microbench |
llama-mtmd-cli |
tools/mtmd/ |
@ggml-org/llama-mtmd | Multimodal (vision/audio) CLI |
llama-gguf-split |
tools/gguf-split/ |
(no listed owner) | Split / merge GGUFs |
rpc-server |
tools/rpc/ |
@ggml-org/ggml-rpc | Remote ggml backend server |
llama-tokenize |
tools/tokenize/ |
@ggerganov | CLI tokenizer / detokenizer |
llama-tts |
tools/tts/ |
@ggerganov | Text-to-speech driver (uses an LLM + audio decoder) |
llama-completion |
tools/completion/ |
@ggerganov | FIM-style completion harness |
llama-parser |
tools/parser/ |
@pwilkin | Standalone PEG/autoparser CLI |
llama-fit-params |
tools/fit-params/ |
(no listed owner) | Curve-fits sampling/quant parameters |
llama-cvector-generator |
tools/cvector-generator/ |
(no listed owner) | Build control vectors |
llama-export-lora |
tools/export-lora/ |
(no listed owner) | Merge LoRA into a GGUF |
tools/results/ |
— | (no listed owner) | Helper for storing/parsing benchmark results |
The examples/ directory has additional smaller demos (Android, Swift, vim plugin, simple-chat, retrieval, ...). They are not "shipped tools" in the same sense, so they are not enumerated individually here, but the most useful ones are referenced where they relate to a tool.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Adapters
Next
llama-cli