Factory.ai

Open-Source Wikis

/

llama.cpp

/

Tools

ggml-org/llama.cpp

Tools

Everything in tools/ is a standalone binary built against libllama plus common/. Each tool is a small program (typically one or two .cpp files) with its own CMakeLists.txt and README.md. Some tools also pull in extra ggml backends (e.g. llama-bench benchmarks every backend, llama-rpc-server ships its own networking).

This wiki uses the tools lens (matching the repo's own directory name) for these binaries; collectively they are the project's "applications". For the underlying library subsystems they exercise, see Systems.

Pages

  • llama-cli — the primary chat / completion CLI.
  • llama-server — OpenAI-compatible HTTP server with WebUI.
  • llama-quantize — produce quantized GGUFs.
  • llama-imatrix — produce importance-matrix files for IQ-quants.
  • llama-perplexity — perplexity / KL-div / HellaSwag evaluator.
  • llama-bench — throughput benchmarks.
  • llama-mtmd-cli and clip — multimodal CLI plus the CLIP-based vision encoder.
  • Other tools — gguf-split, gguf, batched-bench, rpc-server, tokenize, tts, completion, parser, fit-params, cvector-generator, export-lora, results.

Tool index

Binary Source dir Owner (CODEOWNERS) Brief
llama-cli tools/cli/ @ngxson Interactive chat / single-shot generation
llama-server tools/server/ @ggml-org/llama-server OpenAI-compatible HTTP server + WebUI
llama-quantize tools/quantize/ @ggerganov Quantize GGUFs
llama-imatrix tools/imatrix/ (no listed owner) Importance matrix generator
llama-perplexity tools/perplexity/ @ggerganov Perplexity / quality evaluator
llama-bench tools/llama-bench/ (no listed owner) Throughput benchmark
llama-batched-bench tools/batched-bench/ @ggerganov Batched-decoding microbench
llama-mtmd-cli tools/mtmd/ @ggml-org/llama-mtmd Multimodal (vision/audio) CLI
llama-gguf-split tools/gguf-split/ (no listed owner) Split / merge GGUFs
rpc-server tools/rpc/ @ggml-org/ggml-rpc Remote ggml backend server
llama-tokenize tools/tokenize/ @ggerganov CLI tokenizer / detokenizer
llama-tts tools/tts/ @ggerganov Text-to-speech driver (uses an LLM + audio decoder)
llama-completion tools/completion/ @ggerganov FIM-style completion harness
llama-parser tools/parser/ @pwilkin Standalone PEG/autoparser CLI
llama-fit-params tools/fit-params/ (no listed owner) Curve-fits sampling/quant parameters
llama-cvector-generator tools/cvector-generator/ (no listed owner) Build control vectors
llama-export-lora tools/export-lora/ (no listed owner) Merge LoRA into a GGUF
tools/results/ (no listed owner) Helper for storing/parsing benchmark results

The examples/ directory has additional smaller demos (Android, Swift, vim plugin, simple-chat, retrieval, ...). They are not "shipped tools" in the same sense, so they are not enumerated individually here, but the most useful ones are referenced where they relate to a tool.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Tools – llama.cpp wiki | Factory