Factory.ai

Open-Source Wikis

/

llama.cpp

/

How to contribute

/

Testing

ggml-org/llama.cpp

Testing

Active contributors: Georgi Gerganov, Johannes Gäßler

llama.cpp tests run through CMake's CTest integration. The fast suite targets correctness; a long-form CI under ci/ exercises end-to-end performance and quality on self-hosted runners.

Fast tests (CTest)

Configure with tests enabled, then run:

cmake -B build -DLLAMA_BUILD_TESTS=ON
cmake --build build --config Release -j
ctest --test-dir build --output-on-failure

Where they live: tests/. Highlights:

Test What it covers
test-tokenizer-0, test-tokenizer-1-* BPE / SPM / WPM / UGM / RWKV round-trips against a reference tokenizer
test-vocab Vocab loader edge cases
test-grammar-parser, test-grammar-integration, test-grammar-llguidance GBNF parser and the optional llguidance backend
test-json-schema-to-grammar Schema → GBNF conversion
test-chat, test-chat-template, test-chat-parser Chat templating and tool/function-call parsing
test-llama-archs Architecture switch in llama-arch.cpp
test-sampling, test-mtmd-c-api Sampler chains, MTMD C-API
test-quantize-fns, test-quantize-perf, test-quantize-stats Quantization correctness and perf vs reference
test-backend-ops The headline test — every ggml op compared across every loaded backend against the CPU reference
test-thread-safety, test-mmap, test-arg-parser, test-log Misc plumbing
tests/peg-parser/ PEG parser snapshots and behavior

tests/snapshots/ and tests/peg-parser/snapshots/ hold golden outputs for the parser tests.

test-backend-ops

tests/test-backend-ops.cpp is the cross-backend conformance test. It enumerates every ggml_op, runs it through the CPU implementation and through every other registered backend, and compares results within a numerical tolerance. If you change a backend kernel, run this test against at least two backends to catch regressions:

GGML_BACKEND_LOG_LEVEL=info ./build/bin/test-backend-ops

CONTRIBUTING.md calls this test out explicitly: "If you modified a ggml operator or added a new one, add the corresponding test cases to test-backend-ops."

Long-form CI (ci/)

ci/ houses scripts for the self-hosted ggml-ci runners that exercise:

  • Multi-backend builds (CUDA, Metal, Vulkan, SYCL, HIP, ...)
  • Real model downloads
  • Perplexity runs against reference models
  • Throughput benchmarks via llama-bench

ci/README.md documents the entry points. The relevant runner labels appear in .github/workflows/. Maintainers manually trigger long CI on PRs with the ggml-ci label.

Server tests

tools/server/tests/ is a Python pytest suite that boots a llama-server, hits its HTTP endpoints, and verifies behavior end-to-end. Run it after changes to anything under tools/server/:

cd tools/server/tests
pip install -r requirements.txt
pytest -x -v

Multimodal tests

tools/mtmd/tests.sh is a shell driver that downloads a small multimodal model, runs llama-mtmd-cli against tools/mtmd/test-1.jpeg and test-2.mp3, and checks the output. Use it after changes under tools/mtmd/.

Performance & quality benchmarks

Two binaries are the standard yardsticks and are both expected as evidence in PRs that touch numerical code:

  • llama-bench — token-throughput, prompt-processing, and generation benchmarks, with multi-GPU and per-backend modes. See tools/llama-bench/README.md.
  • llama-perplexity — perplexity, KL divergence, and HellaSwag-style accuracy on a reference dataset. See tools/perplexity/README.md.

When you change quantization or any kernel that affects numerics, post llama-perplexity and llama-bench numbers in your PR.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Testing – llama.cpp wiki | Factory