ggml-org/llama.cpp

Tooling

Active contributors: Georgi Gerganov, Daniel Bevenius

The build, lint, and codegen tools shipped with llama.cpp.

Build

CMake is the only supported build system. Top-level CMakeLists.txt, with sub-projects in ggml/CMakeLists.txt, src/CMakeLists.txt, common/CMakeLists.txt, tools/<tool>/CMakeLists.txt, tests/CMakeLists.txt, and examples/<example>/CMakeLists.txt.
CMakePresets in CMakePresets.json define canned configurations (Debug/Release × backend combinations).
Makefile at the repo root is a convenience wrapper that calls into CMake. The legacy hand-rolled Makefile from early in the project is gone.
cmake/ holds the project's installable CMake config (llama-config.cmake.in, llama.pc.in) for downstream consumers using find_package(llama).
flake.nix provides Nix users a reproducible build env.

Linting and formatting

Tool	Config	What it checks
`clang-format`	`.clang-format`	C/C++ style; brackets, alignment, indentation
`clang-tidy`	`.clang-tidy`	Static analysis subset that maintainers care about
`editorconfig-checker`	`.editorconfig`, `.ecrc`	Whitespace/trailing rules
`flake8`	`.flake8`	Python style (mostly conversion scripts)
`mypy`	`mypy.ini`	Python types in `gguf-py/`
`pyright`	`pyrightconfig.json`	Stricter Python type checks
`ty`	`ty.toml`	Type checker config (used in CI)
`pre-commit`	`.pre-commit-config.yaml`	Local + CI runner that wires the above into git hooks

Install once:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Code generation

convert_hf_to_gguf_update.py — regenerates lookup tables (mostly tokenizer pre-tokenizer hashes) used by convert_hf_to_gguf.py. Run it when adding a new tokenizer.
examples/gen-docs/ — generates docs/ops.md from the live ggml_op enum so the op coverage table stays accurate.
scripts/gen-* — assorted small generators (gen-authors.sh, gen-build-info.sh, etc.).
scripts/sync-ggml* — bidirectional sync with the standalone ggml-org/ggml repository.
common/build-info.cpp.in — configured at build time to embed the git commit and build flags.

tools/server/webui/ is a separate JavaScript project (npm-based) bundled into the llama-server binary at build time. See its own package.json and the relevant CI workflow for how it's tested. The maintainers responsible for the WebUI are listed under ggml-org/llama-webui in CODEOWNERS.

CI

.github/workflows/build.yml — matrix build on Linux/macOS/Windows × multiple backends.
.github/workflows/server.yml — server-specific test pipeline.
.github/workflows/release.yml — produces the binaries attached to GitHub releases.
.github/workflows/python-* — runs flake8, mypy, pyright on the Python code.
.github/workflows/docker.yml — builds and pushes the images defined under .devops/.
ci/run.sh — entry point used by the self-hosted ggml-ci runners (long-form, multi-backend, multi-GPU).

The .github/actions/ directory contains reusable composite actions used by the workflows.

Profiling helpers

examples/eval-callback/ — register a callback after every tensor eval.
examples/llama-bench/ (now tools/llama-bench) — throughput benchmarking.
examples/gguf-hash/ — verify a GGUF file's tensor data hasn't changed.
pocs/vdot/ — proof-of-concept dot-product microbenchmarks.

Docker

.devops/ holds Dockerfiles, one per backend variant: cpu.Dockerfile, cuda.Dockerfile, vulkan.Dockerfile, intel.Dockerfile, rocm.Dockerfile, musa.Dockerfile, plus a Nix-based image. See docs/docker.md for usage.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Build

Linting and formatting

Code generation

Web UI build

CI

Profiling helpers

Docker