ggml-org/llama.cpp
Patterns and conventions
Active contributors: Georgi Gerganov, Sigbjørn Skjæret, Xuan-Son Nguyen
The conventions here are pulled from CONTRIBUTING.md, .clang-format, .editorconfig, and from how the existing code is written. Reviewers will push back hard on PRs that break these — the project values consistency and minimalism.
Coding rules
- Plain C++. No fancy STL, no templates unless they earn their keep, no exception-heavy designs. Use basic
forloops and obvious data structures. - No new third-party dependencies. Vendor a single header into
vendor/if absolutely necessary, after discussion. - Cross-platform always. Code must build on Linux, macOS, Windows, and embedded toolchains. No
<unistd.h>-only paths without a Windows fallback. - 4 spaces, brackets on the same line.
void * ptr,int & a(pointer/reference glyph hugs the type, not the name). See.clang-format. - Sized integer types (
int32_t,int64_t,uint8_t) in the public API.size_tis fine for byte counts. struct foo {}nottypedef struct foo {} foo. In C++ omit thestruct/enumkeyword when not needed (llama_context * ctx, notstruct llama_context * ctx).- Tensor convention. Row-major. Dim 0 = columns, dim 1 = rows, dim 2 = matrices.
C = ggml_mul_mat(ctx, A, B)computes $C^T = AB^T$.
Naming
snake_caseeverywhere — types, functions, variables.- Names optimize for longest common prefix: prefer
number_small/number_bigoversmall_number/big_numberso related symbols sort together. - Enum values are
UPPER_CASEand prefixed with the enum name:LLAMA_VOCAB_TYPE_BPE,GGML_OP_MUL_MAT. - Function pattern:
<class>_<action>_<noun>. Examples:llama_model_init— classllama_model, actioninit.llama_sampler_chain_remove— classllama_sampler_chain, actionremove.llama_context_set_embeddings—_contextsuffix dropped becausellamais unambiguous.
- Use
_tsuffix only when a type is meant to be opaque to users. - Constructor/destructor pairs are
init/free(matching the C-style API).
Error handling
- Public
libllamafunctions returnint32_tstatus codes ornullptr. They do not throw across the C boundary. - Inside the library, asserts (
GGML_ASSERT,LLAMA_ASSERT) handle invariant violations. Use them liberally for "this should never happen." - In
common/and tool code,std::runtime_erroris acceptable; the tools catch it atmain. errno-style global state is avoided.
Public API surface
The C API is the contract. New public functions are added with care.
- All public symbols live in
include/llama.h(andggml/include/*.hforlibggml). - ABI breakage is tracked in issue 9289 (libllama) and issue 9291 (server REST API).
- C++-only conveniences live in
include/llama-cpp.h(smart-pointer typedefs).
Cross-cutting patterns
- Memory ownership.
llama_model,llama_context,llama_samplerare heap-allocated and freed by their_freecounterparts. No automatic ref-counting. mmapfirst. Models are loaded withmmapby default (src/llama-mmap.cpp);--no-mmapfalls back to read-then-allocate. Quantization runs without mmap.- Backend agnosticism.
libllamanever references a specific backend type. Tensor placement goes throughggml_backend_sched. New per-architecture code insrc/models/should not need backend-specific paths. - Argument parsing. Every tool builds its argv parser using
common/arg.cppso flags stay consistent. New CLI flags go intocommon_argdefinitions insidearg.cppwhenever they apply to multiple tools. - Logging. Always go through
LOG_INF/LOG_WRN/LOG_ERR/LOG_DBGfromcommon/log.h. Do not callprintfdirectly in library or tool code. - JSON. The vendored
nlohmann/json(vendor/nlohmann/json.hpp) is the single supported JSON library. Tools and the server use it extensively. - HTTP. The server uses the vendored
cpp-httplibsingle-header (vendor/). No higher-level HTTP framework.
Documenting
- Public header comments are the source of truth for API behavior. Don't move them into prose-only docs.
- Backend-specific build instructions belong in
docs/backend/<backend>.md. - Per-tool docs belong in
tools/<tool>/README.md. - If you add a new model architecture, update
docs/development/HOWTO-add-model.mdif your path differs from what's described there.
Submodule sync
ggml/ is conceptually a sub-project but lives in this repo as a copy. The scripts under scripts/sync-ggml* mirror changes between ggml-org/ggml and llama.cpp. If your change touches ggml/, expect it to be replayed back to the standalone ggml repo eventually.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Debugging
Next
Tooling