ggml-org/llama.cpp
Dependencies
llama.cpp is intentionally lean. The CONTRIBUTING guide tells contributors to "avoid adding third-party dependencies." Most "dependencies" are vendored single-header libraries under vendor/; only a handful are real, system-level packages, and they are all gated by build flags.
System (build-time, optional)
| Dependency | Required when | Notes |
|---|---|---|
| CMake ≥ 3.14 | Always | Build system |
| C++17 compiler | Always | gcc / clang / MSVC / Apple clang |
| Python ≥ 3.9 | Building Python tooling | convert_*.py, gguf-py |
| CUDA Toolkit | -DGGML_CUDA=ON |
NVIDIA backend |
| HIP / ROCm | -DGGML_HIP=ON |
AMD backend |
| Metal SDK | macOS by default | Apple backend |
| Vulkan SDK | -DGGML_VULKAN=ON |
glslc for shader compilation |
| Intel oneAPI / DPC++ | -DGGML_SYCL=ON |
SYCL backend |
| MUSA SDK | -DGGML_MUSA=ON |
Moore Threads |
| OpenCL | -DGGML_OPENCL=ON |
Adreno path |
| Qualcomm Hexagon SDK | -DGGML_HEXAGON=ON |
DSP path |
| Huawei CANN | -DGGML_CANN=ON |
Ascend NPU |
| OpenVINO Runtime | -DGGML_OPENVINO=ON |
Intel CPU/iGPU/NPU |
| WebGPU runtime (Dawn / wgpu-native) | -DGGML_WEBGPU=ON |
WebGPU |
| OpenMP | -DGGML_OPENMP=ON |
Optional CPU threading |
| BLAS (OpenBLAS / MKL / Accelerate) | -DGGML_BLAS=ON |
Prompt-processing matmul |
| libcurl | -DLLAMA_CURL=ON |
-hf <repo> downloads |
| OpenSSL | -DLLAMA_SERVER_SSL=ON |
HTTPS server |
| zDNN library | -DGGML_ZDNN=ON |
IBM Z |
| ZenDNN library | -DGGML_ZENDNN=ON |
AMD Zen CPU |
Without any of these, you still get a fully working CPU build.
Vendored (vendor/)
These ship in-tree as single-header or minimal-source libraries. None require external installation.
| Library | Purpose |
|---|---|
vendor/nlohmann/json.hpp |
JSON parsing/printing for the server, autoparser, and tooling |
vendor/cpp-httplib/httplib.h |
Single-header HTTP server used by tools/server and tools/rpc |
vendor/minja/minja.hpp |
Jinja2-compatible template engine used by common/chat.cpp |
vendor/stb/stb_image.h |
Image loading for tools/mtmd |
vendor/miniaudio/miniaudio.h |
Audio I/O for tools/tts and tools/mtmd-audio |
vendor/llguidance/ |
Optional Rust grammar engine (build-gated by LLAMA_LLGUIDANCE) |
vendor/cpp-jsonschema/ |
JSON Schema validation helpers |
Third-party licenses are recorded in licenses/.
Python (requirements*.txt, pyproject.toml)
Python tooling lives in gguf-py/ and the top-level convert_*.py scripts. Requirements are split into focused files under requirements/ and aggregated by requirements.txt at the repo root. gguf-py/pyproject.toml is poetry-managed.
Top-level packages (subset):
numpy— tensor math during conversion.torch— used by some conversion paths to load HF checkpoints.safetensors— read modern HF checkpoints.sentencepiece,tokenizers— reference tokenizers used during conversion / vocab generation.protobuf— sentencepiece model files.huggingface_hub— used by some conversion helpers.
For development:
pytest(server tests, gguf-py tests).flake8,mypy,pyright,ty— linters/type checkers.pre-commit— local hook runner.
Build artifacts
A standard Release build produces (with default flags):
build/bin/llama-clibuild/bin/llama-serverbuild/bin/llama-quantizebuild/bin/llama-benchbuild/bin/llama-imatrixbuild/bin/llama-perplexitybuild/bin/llama-tokenizebuild/bin/llama-mtmd-clibuild/bin/llama-gguf-splitbuild/bin/test-backend-ops(with-DLLAMA_BUILD_TESTS=ON)- ... plus example binaries and any backend plugins built as shared libraries
make install (CMake) installs libllama, libggml, headers, and the package config under ${CMAKE_INSTALL_PREFIX}.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Data models
Next
Maintainers