Factory.ai

Open-Source Wikis

/

llama.cpp

/

Systems

/

Library entry point

ggml-org/llama.cpp

Library entry point

Active contributors: Georgi Gerganov

src/llama.cpp is small and acts as glue. The bulk of the library lives in sibling files (llama-model.cpp, llama-context.cpp, llama-vocab.cpp, ...). This page describes how the entry point wires everything together so newcomers can find their way around.

Purpose

Bind the public C API declared in include/llama.h to the internal C++ subsystems, register the GGML backends, and own the global init/free machinery.

Key entry points (in include/llama.h)

Public symbol Bound to
llama_backend_init / llama_backend_free Global GGML init via ggml_backend_load_all (src/llama.cpp)
llama_model_load_from_file / llama_model_load_from_splits llama_model_loader::load_* in src/llama-model-loader.cpp
llama_model_free llama_model::~llama_model in src/llama-model.cpp
llama_init_from_model llama_context constructor in src/llama-context.cpp
llama_free llama_context::~llama_context
llama_decode, llama_encode llama_context::decode / encode (graph build → schedule → run)
llama_get_logits*, llama_get_embeddings* llama_context accessors
llama_kv_*, llama_memory_* Forwarded to the active llama_memory impl (src/llama-memory*.cpp)
llama_tokenize, llama_token_to_piece, llama_detokenize llama_vocab
llama_chat_apply_template src/llama-chat.cpp
llama_sampler_* src/llama-sampler.cpp
llama_grammar_* src/llama-grammar.cpp
llama_adapter_lora_* src/llama-adapter.cpp
llama_state_* Save/restore via src/llama-context.cpp and src/llama-model-saver.cpp

Registration of backends

llama_backend_init calls ggml_backend_load_all, which iterates the registered backends in ggml/src/ggml-backend-reg.cpp. With BUILD_SHARED_LIBS=ON, each backend ships as a separate libggml-<backend>.so/.dll and is loaded through ggml/src/ggml-backend-dl.cpp.

graph TD
    App[Tool / app] -->|llama_backend_init| Llama[src/llama.cpp]
    Llama --> Reg[ggml-backend-reg.cpp]
    Reg --> CPU[ggml-cpu]
    Reg --> CUDA[ggml-cuda]
    Reg --> Metal[ggml-metal]
    Reg --> Vulkan[ggml-vulkan]
    Reg --> Other[sycl, opencl, hip, hexagon, rpc, webgpu, ...]

Implementation file

File Lines (~) Purpose
src/llama.cpp 19k Public C API → internal C++ glue, backend init/free
src/llama-impl.h / .cpp 6k Logging macros, internal asserts, helper utilities
src/llama-ext.h 3k Extra symbols not in the stable public header

Where to start when reading the code

  1. Open include/llama.h and pick the function you care about (e.g. llama_decode).
  2. Find its definition in src/llama.cpp — it's typically a one- or two-line forwarder.
  3. Follow the call into src/llama-context.cpp, src/llama-model.cpp, etc. for the actual work.

Most "real" inference logic lives in llama-context.cpp (the largest of the per-subsystem files) and the per-architecture builders under src/models/.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Library entry point – llama.cpp wiki | Factory