Factory.ai

Open-Source Wikis

/

llama.cpp

/

Tools

/

Other tools

ggml-org/llama.cpp

Other tools

Single-page summary of the smaller binaries under tools/ plus the most useful demos under examples/. For per-binary CLI documentation, see each tool's README.md.

Binaries under tools/

llama-gguf-split

Splits a large GGUF into multiple files (or merges splits back together). Handy for distributing big models across HuggingFace 50 GB upload limits or for reducing peak memory on quantization. Source: tools/gguf-split/gguf-split.cpp. README: tools/gguf-split/README.md. The split convention is <name>-NNNNN-of-MMMMM.gguf; the llama-model-loader understands this layout natively.

rpc-server

A small TCP server that exposes a remote ggml_backend so a client process can offload tensors over the network. Pairs with GGML_RPC=ON builds. Source: tools/rpc/. The corresponding backend lives at ggml/src/ggml-rpc/.

llama-tokenize

Tokenize and detokenize on the command line — useful for verifying tokenizer behavior or converting between text and ids in scripts. Source: tools/tokenize/tokenize.cpp.

llama-tts

Text-to-speech driver. Wraps a small LLM that produces speech tokens, then decodes those into audio. Source: tools/tts/. Examples include the OuteTTS pipeline.

llama-completion

A FIM ("fill in the middle") completion harness used by the editor plugins (examples/llama.vim, llama.vscode). Source: tools/completion/.

llama-parser

Standalone CLI that exercises the PEG parser and autoparser. Useful when iterating on tool-call extraction. Source: tools/parser/. Maintainer: @pwilkin.

llama-batched-bench

Microbench dedicated to batched-decoding behavior — varies batch size, sequence count, prompt length to study batching efficiency. Source: tools/batched-bench/.

llama-cvector-generator

Generates control vectors (per-layer biases) from contrastive prompts. Output is a GGUF that --control-vector can apply to any context. Source: tools/cvector-generator/.

llama-export-lora

Merges a LoRA adapter into a base GGUF, producing a single new model file. Source: tools/export-lora/.

llama-fit-params

Curve-fits sampling and quantization parameters from observed data. Used by maintainers when tuning defaults. Source: tools/fit-params/.

tools/results/

Helper for storing and comparing benchmark output across runs. Not a binary on its own.

Notable directories under examples/

The examples/ tree is bigger than tools/ and covers smaller demos plus platform integrations. Highlights:

Path Purpose
examples/simple/, examples/simple-chat/ Minimal C examples of the libllama API
examples/simple-cmake-pkg/ Smallest possible downstream consumer using find_package(llama)
examples/llama.android/ Android NDK app + JNI binding
examples/llama.swiftui/, examples/batched.swift/ Swift/SwiftUI iOS/macOS integration
examples/llama.vim Neovim plugin for FIM completion
examples/embedding/, examples/retrieval/ Embedding-focused demos
examples/parallel/, examples/passkey/ Multi-sequence and long-context demos
examples/eval-callback/ Hook into per-tensor evaluation
examples/save-load-state/ Serialize and restore a llama_context
examples/speculative/, examples/speculative-simple/, examples/lookup/, examples/lookahead/ Speculative decoding demos with different drafters
examples/diffusion/ Discrete diffusion text generation experiments
examples/training/ Optimizer / training experiments on top of ggml-opt
examples/sycl/ SYCL backend smoke tests
examples/gguf/, examples/gguf-hash/ GGUF inspection utilities
examples/model-conversion/ End-to-end conversion + verification harness for new models
examples/json_schema_to_grammar.py, examples/pydantic_models_to_grammar.py Python equivalents of common/json-schema-to-grammar.cpp

examples/deprecation-warning/ is a special case: it produces binaries like main and server that print a friendly "renamed to llama-cli / llama-server" message.

Tests under tests/

While not "tools" per se, several test binaries are useful as exploratory entry points:

  • tests/test-backend-ops — manually invoke specific ops/backends.
  • tests/test-tokenizer-* — tokenizer fixtures.
  • tests/test-chat, tests/test-chat-parser — chat templating harnesses.
  • tests/peg-parser/ — PEG snapshots for the autoparser.

See Testing for the full list.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Other tools – llama.cpp wiki | Factory