ggml-org/llama.cpp

By the numbers

Data collected on 2026-04-30 from master at commit beb42fffa.

Size

Approximate non-blank source line counts, computed by wc -l on tracked files (excluding .git/ and vendor/). The codebase is overwhelmingly C++ on top of plain C in ggml/, with a substantial Python conversion stack.

xychart-beta horizontal
    title "Lines of code by language"
    x-axis ["C++ (.cpp)", "Python (.py)", "C (.c)", "C/C++ headers", "CUDA (.cu)", "C++ headers (.hpp)", "Metal (.metal)"]
    y-axis "Lines" 0 --> 350000
    bar [319369, 53402, 52509, 49937, 21365, 16217, 10627]

Language	Lines
C++ (`.cpp`)	319,369
Python (`.py`)	53,402
C (`.c`)	52,509
C/C++ headers (`.h`)	49,937
CUDA (`.cu`)	21,365
C++ headers (`.hpp`)	16,217
Metal (`.metal`)	10,627

The five largest individual source files give a sense of where complexity concentrates.

File	Bytes
`convert_hf_to_gguf.py`	651,450
`src/llama-model.cpp`	546,186
`ggml/src/ggml.c`	247,981
`ggml/src/ggml-quants.c`	222,579
`common/arg.cpp`	188,905

Activity

The repo has moved at a steady, high pace since the very first commit on 2023-03-10 ("Initial release"). Selected snapshot:

Metric	Value
Total commits on `master`	8,991
Unique authors (all-time)	1,600
Commits in the last 90 days	1,096
Daily commit count, last 30 days	typically 8–19

Top commit-count contributors (all-time, derived from git log --pretty=%an):

Author	Commits
Georgi Gerganov	1,731
Johannes Gäßler	370
Xuan-Son Nguyen	302
Jeff Bolz	270
Sigbjørn Skjæret	266
Daniel Bevenius	253
slaren	214
Diego Devesa	141

These are commit counts only — no opinion implied about who is "best". For ownership see Maintainers.

Bot-attributed commits

A grep over the last 90 days of git history for [bot] co-author trailers and bot author names finds 0 bot-attributed commits. This is consistent with the project's AI policy, which forbids fully AI-generated submissions and AI-written PR descriptions or commit messages. Inline AI assistance is permitted but leaves no trace in git history, so this number is a strict lower bound on AI-assisted work.

Complexity

A few code volume signals worth noting (all sizes are wc -l on tracked files):

Subsystem	Notable concentration
`src/llama-model.cpp`	~10.7k LOC — single file containing tensor allocation for every supported text architecture
`src/models/`	~70 per-architecture graph builders, one file per LLM family
`ggml/src/ggml.c`	Core CPU compute kernels and graph machinery; ~22k LOC
`ggml/src/ggml-quants.c`	Reference quantization kernels for every `ggml_type`; ~13k LOC
`ggml/src/ggml-cuda/`	Largest backend by file count; per-op kernels in dozens of files
`tools/server/server-context.cpp`	~5k LOC scheduler + slot loop driving the HTTP server
`convert_hf_to_gguf.py`	~16k LOC — a switch over every supported HuggingFace model class

Test surface

Tests live under tests/. They run via ctest and cover:

Tokenizer round-trips for BPE, SPM, WPM, UGM, RWKV
GGUF reader/writer
Sampling chains and grammar
Chat parser, PEG parser, autoparser
A backend-ops conformance suite (tests/test-backend-ops.cpp) that compares each backend's kernel implementation against the CPU reference for every ggml op

See Testing for how to run them.

Dependencies

llama.cpp prides itself on minimal dependencies. Headers under vendor/ are vendored single-file libraries (stb_image.h, nlohmann/json.hpp, httplib.h, minja.hpp, etc.). The only optional system dependencies are CMake-driven and gated by GGML_* and LLAMA_* flags — for instance LLAMA_CURL for HuggingFace downloads, GGML_BLAS for system BLAS, GGML_CUDA/GGML_HIP/GGML_METAL for GPU backends. See Reference → Dependencies.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.