Factory.ai

Open-Source Wikis

/

llama.cpp

/

How to contribute

/

Tooling

ggml-org/llama.cpp

Tooling

Active contributors: Georgi Gerganov, Daniel Bevenius

The build, lint, and codegen tools shipped with llama.cpp.

Build

  • CMake is the only supported build system. Top-level CMakeLists.txt, with sub-projects in ggml/CMakeLists.txt, src/CMakeLists.txt, common/CMakeLists.txt, tools/<tool>/CMakeLists.txt, tests/CMakeLists.txt, and examples/<example>/CMakeLists.txt.
  • CMakePresets in CMakePresets.json define canned configurations (Debug/Release × backend combinations).
  • Makefile at the repo root is a convenience wrapper that calls into CMake. The legacy hand-rolled Makefile from early in the project is gone.
  • cmake/ holds the project's installable CMake config (llama-config.cmake.in, llama.pc.in) for downstream consumers using find_package(llama).
  • flake.nix provides Nix users a reproducible build env.

Linting and formatting

Tool Config What it checks
clang-format .clang-format C/C++ style; brackets, alignment, indentation
clang-tidy .clang-tidy Static analysis subset that maintainers care about
editorconfig-checker .editorconfig, .ecrc Whitespace/trailing rules
flake8 .flake8 Python style (mostly conversion scripts)
mypy mypy.ini Python types in gguf-py/
pyright pyrightconfig.json Stricter Python type checks
ty ty.toml Type checker config (used in CI)
pre-commit .pre-commit-config.yaml Local + CI runner that wires the above into git hooks

Install once:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Code generation

  • convert_hf_to_gguf_update.py — regenerates lookup tables (mostly tokenizer pre-tokenizer hashes) used by convert_hf_to_gguf.py. Run it when adding a new tokenizer.
  • examples/gen-docs/ — generates docs/ops.md from the live ggml_op enum so the op coverage table stays accurate.
  • scripts/gen-* — assorted small generators (gen-authors.sh, gen-build-info.sh, etc.).
  • scripts/sync-ggml* — bidirectional sync with the standalone ggml-org/ggml repository.
  • common/build-info.cpp.in — configured at build time to embed the git commit and build flags.

Web UI build

tools/server/webui/ is a separate JavaScript project (npm-based) bundled into the llama-server binary at build time. See its own package.json and the relevant CI workflow for how it's tested. The maintainers responsible for the WebUI are listed under ggml-org/llama-webui in CODEOWNERS.

CI

  • .github/workflows/build.yml — matrix build on Linux/macOS/Windows × multiple backends.
  • .github/workflows/server.yml — server-specific test pipeline.
  • .github/workflows/release.yml — produces the binaries attached to GitHub releases.
  • .github/workflows/python-* — runs flake8, mypy, pyright on the Python code.
  • .github/workflows/docker.yml — builds and pushes the images defined under .devops/.
  • ci/run.sh — entry point used by the self-hosted ggml-ci runners (long-form, multi-backend, multi-GPU).

The .github/actions/ directory contains reusable composite actions used by the workflows.

Profiling helpers

  • examples/eval-callback/ — register a callback after every tensor eval.
  • examples/llama-bench/ (now tools/llama-bench) — throughput benchmarking.
  • examples/gguf-hash/ — verify a GGUF file's tensor data hasn't changed.
  • pocs/vdot/ — proof-of-concept dot-product microbenchmarks.

Docker

.devops/ holds Dockerfiles, one per backend variant: cpu.Dockerfile, cuda.Dockerfile, vulkan.Dockerfile, intel.Dockerfile, rocm.Dockerfile, musa.Dockerfile, plus a Nix-based image. See docs/docker.md for usage.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Tooling – llama.cpp wiki | Factory