ggml-org/llama.cpp
By the numbers
Data collected on 2026-04-30 from master at commit beb42fffa.
Size
Approximate non-blank source line counts, computed by wc -l on tracked files (excluding .git/ and vendor/). The codebase is overwhelmingly C++ on top of plain C in ggml/, with a substantial Python conversion stack.
xychart-beta horizontal
title "Lines of code by language"
x-axis ["C++ (.cpp)", "Python (.py)", "C (.c)", "C/C++ headers", "CUDA (.cu)", "C++ headers (.hpp)", "Metal (.metal)"]
y-axis "Lines" 0 --> 350000
bar [319369, 53402, 52509, 49937, 21365, 16217, 10627]| Language | Lines |
|---|---|
C++ (.cpp) |
319,369 |
Python (.py) |
53,402 |
C (.c) |
52,509 |
C/C++ headers (.h) |
49,937 |
CUDA (.cu) |
21,365 |
C++ headers (.hpp) |
16,217 |
Metal (.metal) |
10,627 |
The five largest individual source files give a sense of where complexity concentrates.
| File | Bytes |
|---|---|
convert_hf_to_gguf.py |
651,450 |
src/llama-model.cpp |
546,186 |
ggml/src/ggml.c |
247,981 |
ggml/src/ggml-quants.c |
222,579 |
common/arg.cpp |
188,905 |
Activity
The repo has moved at a steady, high pace since the very first commit on 2023-03-10 ("Initial release"). Selected snapshot:
| Metric | Value |
|---|---|
Total commits on master |
8,991 |
| Unique authors (all-time) | 1,600 |
| Commits in the last 90 days | 1,096 |
| Daily commit count, last 30 days | typically 8–19 |
Top commit-count contributors (all-time, derived from git log --pretty=%an):
| Author | Commits |
|---|---|
| Georgi Gerganov | 1,731 |
| Johannes Gäßler | 370 |
| Xuan-Son Nguyen | 302 |
| Jeff Bolz | 270 |
| Sigbjørn Skjæret | 266 |
| Daniel Bevenius | 253 |
| slaren | 214 |
| Diego Devesa | 141 |
These are commit counts only — no opinion implied about who is "best". For ownership see Maintainers.
Bot-attributed commits
A grep over the last 90 days of git history for [bot] co-author trailers and bot author names finds 0 bot-attributed commits. This is consistent with the project's AI policy, which forbids fully AI-generated submissions and AI-written PR descriptions or commit messages. Inline AI assistance is permitted but leaves no trace in git history, so this number is a strict lower bound on AI-assisted work.
Complexity
A few code volume signals worth noting (all sizes are wc -l on tracked files):
| Subsystem | Notable concentration |
|---|---|
src/llama-model.cpp |
~10.7k LOC — single file containing tensor allocation for every supported text architecture |
src/models/ |
~70 per-architecture graph builders, one file per LLM family |
ggml/src/ggml.c |
Core CPU compute kernels and graph machinery; ~22k LOC |
ggml/src/ggml-quants.c |
Reference quantization kernels for every ggml_type; ~13k LOC |
ggml/src/ggml-cuda/ |
Largest backend by file count; per-op kernels in dozens of files |
tools/server/server-context.cpp |
~5k LOC scheduler + slot loop driving the HTTP server |
convert_hf_to_gguf.py |
~16k LOC — a switch over every supported HuggingFace model class |
Test surface
Tests live under tests/. They run via ctest and cover:
- Tokenizer round-trips for BPE, SPM, WPM, UGM, RWKV
- GGUF reader/writer
- Sampling chains and grammar
- Chat parser, PEG parser, autoparser
- A backend-ops conformance suite (
tests/test-backend-ops.cpp) that compares each backend's kernel implementation against the CPU reference for everyggmlop
See Testing for how to run them.
Dependencies
llama.cpp prides itself on minimal dependencies. Headers under vendor/ are vendored single-file libraries (stb_image.h, nlohmann/json.hpp, httplib.h, minja.hpp, etc.). The only optional system dependencies are CMake-driven and gated by GGML_* and LLAMA_* flags — for instance LLAMA_CURL for HuggingFace downloads, GGML_BLAS for system BLAS, GGML_CUDA/GGML_HIP/GGML_METAL for GPU backends. See Reference → Dependencies.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Glossary
Next
Lore