Factory.ai

Open-Source Wikis

/

llama.cpp

/

How to contribute

/

Development workflow

ggml-org/llama.cpp

Development workflow

Active contributors: Georgi Gerganov, Johannes Gäßler, Sigbjørn Skjæret

This page distills the day-to-day flow of "I want to land a change in llama.cpp."

Branch

llama.cpp uses a single master branch. Fork the repo, create a feature branch, and push. There is no long-lived release branch — releases are tags cut from master (see .github/workflows/release.yml).

git clone https://github.com/<you>/llama.cpp
cd llama.cpp
git checkout -b feat/whatever

Build

The CMake build is the only supported path. The Makefile in the root is a thin wrapper that forwards to CMake.

cmake -B build -DLLAMA_BUILD_TESTS=ON
cmake --build build --config Release -j

Backend-specific flags and presets live in ggml/CMakeLists.txt and CMakePresets.json. See Getting started for the common flags.

common/build-info.cpp.in is configured at build time to embed git commit info into binaries. If you regenerate the build directory, that file is refreshed.

Iterate

Most subsystems can be exercised with one of the in-tree tools:

Subsystem you changed Quick smoke test
Tokenizer (src/llama-vocab.cpp) ./build/bin/llama-tokenize -m model.gguf "your text"
Sampler (src/llama-sampler.cpp) ./build/bin/llama-cli -m model.gguf -p "..." --top-k 40 --temp 0.7 ...
Grammar (src/llama-grammar.cpp) ./build/bin/llama-cli ... --grammar-file grammars/json.gbnf
Chat template (src/llama-chat.cpp or common/chat.cpp) ./build/bin/llama-cli -cnv -m model.gguf
GGUF reader/writer (ggml/src/gguf.cpp) ./build/bin/llama-gguf-split, ./build/bin/llama-gguf
Quantization (src/llama-quant.cpp) ./build/bin/llama-quantize in.gguf out.gguf Q4_K_M
Server endpoints (tools/server/) ./build/bin/llama-server -m model.gguf then hit :8080
Backend op (ggml/src/ggml-*/) ./build/bin/test-backend-ops

Test

ctest --test-dir build --output-on-failure

Detailed list in Testing. For long-form CI see ci/README.md.

Format

The repo enforces basic style with .clang-format, .editorconfig, and pre-commit. Install the hooks once:

pip install pre-commit
pre-commit install

pre-commit run --all-files reproduces the CI check.

Commit and push

Squash logically. The maintainers will squash again on merge. Use the commit format described in CONTRIBUTING.md:

<module> : <short title> (#<issue_number>)

Examples from real history: common : check for null getpwuid in hf-cache (#22550), ggml : fix bug in some quants (#xxxx).

Pull request

  • Open against master.
  • Fill in .github/pull_request_template.md.
  • Search existing PRs and issues first.
  • Limit yourself to one PR at a time if you are a new contributor.
  • Allow maintainers to push to your branch when reasonable — it speeds up review.

After merge

  • Watch for follow-up issues mentioning your change.
  • If the change touched a CODEOWNERS path, expect to be looped in on related future PRs.
  • If you maintain a model architecture you added, consider adding yourself to CODEOWNERS.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Development workflow – llama.cpp wiki | Factory