Factory.ai

Open-Source Wikis

/

llama.cpp

/

How to contribute

/

Patterns and conventions

ggml-org/llama.cpp

Patterns and conventions

Active contributors: Georgi Gerganov, Sigbjørn Skjæret, Xuan-Son Nguyen

The conventions here are pulled from CONTRIBUTING.md, .clang-format, .editorconfig, and from how the existing code is written. Reviewers will push back hard on PRs that break these — the project values consistency and minimalism.

Coding rules

  • Plain C++. No fancy STL, no templates unless they earn their keep, no exception-heavy designs. Use basic for loops and obvious data structures.
  • No new third-party dependencies. Vendor a single header into vendor/ if absolutely necessary, after discussion.
  • Cross-platform always. Code must build on Linux, macOS, Windows, and embedded toolchains. No <unistd.h>-only paths without a Windows fallback.
  • 4 spaces, brackets on the same line. void * ptr, int & a (pointer/reference glyph hugs the type, not the name). See .clang-format.
  • Sized integer types (int32_t, int64_t, uint8_t) in the public API. size_t is fine for byte counts.
  • struct foo {} not typedef struct foo {} foo. In C++ omit the struct/enum keyword when not needed (llama_context * ctx, not struct llama_context * ctx).
  • Tensor convention. Row-major. Dim 0 = columns, dim 1 = rows, dim 2 = matrices. C = ggml_mul_mat(ctx, A, B) computes $C^T = AB^T$.

Naming

  • snake_case everywhere — types, functions, variables.
  • Names optimize for longest common prefix: prefer number_small / number_big over small_number / big_number so related symbols sort together.
  • Enum values are UPPER_CASE and prefixed with the enum name: LLAMA_VOCAB_TYPE_BPE, GGML_OP_MUL_MAT.
  • Function pattern: <class>_<action>_<noun>. Examples:
    • llama_model_init — class llama_model, action init.
    • llama_sampler_chain_remove — class llama_sampler_chain, action remove.
    • llama_context_set_embeddings_context suffix dropped because llama is unambiguous.
  • Use _t suffix only when a type is meant to be opaque to users.
  • Constructor/destructor pairs are init / free (matching the C-style API).

Error handling

  • Public libllama functions return int32_t status codes or nullptr. They do not throw across the C boundary.
  • Inside the library, asserts (GGML_ASSERT, LLAMA_ASSERT) handle invariant violations. Use them liberally for "this should never happen."
  • In common/ and tool code, std::runtime_error is acceptable; the tools catch it at main.
  • errno-style global state is avoided.

Public API surface

The C API is the contract. New public functions are added with care.

Cross-cutting patterns

  • Memory ownership. llama_model, llama_context, llama_sampler are heap-allocated and freed by their _free counterparts. No automatic ref-counting.
  • mmap first. Models are loaded with mmap by default (src/llama-mmap.cpp); --no-mmap falls back to read-then-allocate. Quantization runs without mmap.
  • Backend agnosticism. libllama never references a specific backend type. Tensor placement goes through ggml_backend_sched. New per-architecture code in src/models/ should not need backend-specific paths.
  • Argument parsing. Every tool builds its argv parser using common/arg.cpp so flags stay consistent. New CLI flags go into common_arg definitions inside arg.cpp whenever they apply to multiple tools.
  • Logging. Always go through LOG_INF / LOG_WRN / LOG_ERR / LOG_DBG from common/log.h. Do not call printf directly in library or tool code.
  • JSON. The vendored nlohmann/json (vendor/nlohmann/json.hpp) is the single supported JSON library. Tools and the server use it extensively.
  • HTTP. The server uses the vendored cpp-httplib single-header (vendor/). No higher-level HTTP framework.

Documenting

  • Public header comments are the source of truth for API behavior. Don't move them into prose-only docs.
  • Backend-specific build instructions belong in docs/backend/<backend>.md.
  • Per-tool docs belong in tools/<tool>/README.md.
  • If you add a new model architecture, update docs/development/HOWTO-add-model.md if your path differs from what's described there.

Submodule sync

ggml/ is conceptually a sub-project but lives in this repo as a copy. The scripts under scripts/sync-ggml* mirror changes between ggml-org/ggml and llama.cpp. If your change touches ggml/, expect it to be replayed back to the standalone ggml repo eventually.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Patterns and conventions – llama.cpp wiki | Factory