ggml-org/llama.cpp

Patterns and conventions

Active contributors: Georgi Gerganov, Sigbjørn Skjæret, Xuan-Son Nguyen

The conventions here are pulled from CONTRIBUTING.md, .clang-format, .editorconfig, and from how the existing code is written. Reviewers will push back hard on PRs that break these — the project values consistency and minimalism.

Coding rules

Plain C++. No fancy STL, no templates unless they earn their keep, no exception-heavy designs. Use basic for loops and obvious data structures.
No new third-party dependencies. Vendor a single header into vendor/ if absolutely necessary, after discussion.
Cross-platform always. Code must build on Linux, macOS, Windows, and embedded toolchains. No <unistd.h>-only paths without a Windows fallback.
4 spaces, brackets on the same line. void * ptr, int & a (pointer/reference glyph hugs the type, not the name). See .clang-format.
Sized integer types (int32_t, int64_t, uint8_t) in the public API. size_t is fine for byte counts.
struct foo {} not typedef struct foo {} foo. In C++ omit the struct/enum keyword when not needed (llama_context * ctx, not struct llama_context * ctx).
Tensor convention. Row-major. Dim 0 = columns, dim 1 = rows, dim 2 = matrices. C = ggml_mul_mat(ctx, A, B) computes $C^T = AB^T$.

Naming

snake_case everywhere — types, functions, variables.
Names optimize for longest common prefix: prefer number_small / number_big over small_number / big_number so related symbols sort together.
Enum values are UPPER_CASE and prefixed with the enum name: LLAMA_VOCAB_TYPE_BPE, GGML_OP_MUL_MAT.
Function pattern: <class>_<action>_<noun>. Examples:
- llama_model_init — class llama_model, action init.
- llama_sampler_chain_remove — class llama_sampler_chain, action remove.
- llama_context_set_embeddings — _context suffix dropped because llama is unambiguous.
Use _t suffix only when a type is meant to be opaque to users.
Constructor/destructor pairs are init / free (matching the C-style API).

Error handling

Public libllama functions return int32_t status codes or nullptr. They do not throw across the C boundary.
Inside the library, asserts (GGML_ASSERT, LLAMA_ASSERT) handle invariant violations. Use them liberally for "this should never happen."
In common/ and tool code, std::runtime_error is acceptable; the tools catch it at main.
errno-style global state is avoided.

Public API surface

The C API is the contract. New public functions are added with care.

All public symbols live in include/llama.h (and ggml/include/*.h for libggml).
ABI breakage is tracked in issue 9289 (libllama) and issue 9291 (server REST API).
C++-only conveniences live in include/llama-cpp.h (smart-pointer typedefs).

Cross-cutting patterns

Memory ownership. llama_model, llama_context, llama_sampler are heap-allocated and freed by their _free counterparts. No automatic ref-counting.
mmap first. Models are loaded with mmap by default (src/llama-mmap.cpp); --no-mmap falls back to read-then-allocate. Quantization runs without mmap.
Backend agnosticism. libllama never references a specific backend type. Tensor placement goes through ggml_backend_sched. New per-architecture code in src/models/ should not need backend-specific paths.
Argument parsing. Every tool builds its argv parser using common/arg.cpp so flags stay consistent. New CLI flags go into common_arg definitions inside arg.cpp whenever they apply to multiple tools.
Logging. Always go through LOG_INF / LOG_WRN / LOG_ERR / LOG_DBG from common/log.h. Do not call printf directly in library or tool code.
JSON. The vendored nlohmann/json (vendor/nlohmann/json.hpp) is the single supported JSON library. Tools and the server use it extensively.
HTTP. The server uses the vendored cpp-httplib single-header (vendor/). No higher-level HTTP framework.

Documenting

Public header comments are the source of truth for API behavior. Don't move them into prose-only docs.
Backend-specific build instructions belong in docs/backend/<backend>.md.
Per-tool docs belong in tools/<tool>/README.md.
If you add a new model architecture, update docs/development/HOWTO-add-model.md if your path differs from what's described there.

Submodule sync

ggml/ is conceptually a sub-project but lives in this repo as a copy. The scripts under scripts/sync-ggml* mirror changes between ggml-org/ggml and llama.cpp. If your change touches ggml/, expect it to be replayed back to the standalone ggml repo eventually.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.