ggml-org/llama.cpp
Model loader
Active contributors: Georgi Gerganov, Sigbjørn Skjæret
src/llama-model-loader.cpp reads a GGUF file (or a set of split GGUF files), validates it against the architecture registered in src/llama-arch.cpp, and produces a fully-populated llama_model plus llama_vocab. It is the bridge between the on-disk format and the in-memory model.
Purpose
- Open and
mmap(orread) one or more GGUF files. - Read GGUF metadata (architecture, hparams, vocab, chat template, ...) and validate it.
- Resolve every tensor name in the architecture's expected manifest to a tensor in the file.
- Allocate buffers on the correct backends and copy/quantize tensor data as needed.
Directory layout
src/
├── llama-model-loader.h # public types: llama_model_loader, helpers for kv lookups
├── llama-model-loader.cpp # actual loading logic, ~71 KB
├── llama-mmap.cpp / .h # cross-platform mmap, prefetch, mlock
├── llama-arch.cpp / .h # LLM_ARCH_* enum + per-arch tensor manifest
├── llama-hparams.cpp / .h # struct llama_hparams (per-arch hyperparameters)
└── llama-model.cpp / .h # llama_model itselfKey abstractions
| Type | File | Role |
|---|---|---|
llama_model_loader |
src/llama-model-loader.h |
Open file(s), expose gguf_context, manage tensor enumeration |
llama_model_kv_override |
src/llama-model-loader.h |
Override a single KV pair on load (--override-kv flag) |
LLM_ARCH_* enum |
src/llama-arch.h |
Architecture identifier (LLM_ARCH_LLAMA, LLM_ARCH_GEMMA3, ...) |
LLM_TENSOR_* enum |
src/llama-arch.h |
Logical tensor name (LLM_TENSOR_TOKEN_EMBD, LLM_TENSOR_OUTPUT_NORM, ...) |
llm_arch_info table |
src/llama-arch.cpp |
Per-arch mapping from LLM_TENSOR_* to GGUF tensor names |
llama_hparams |
src/llama-hparams.h |
Layer count, head dim, RoPE settings, vocab size, ... |
llama_mmap, llama_mlock |
src/llama-mmap.h |
RAII wrappers over mmap/MapViewOfFile, mlock/VirtualLock |
How it works
sequenceDiagram
participant App
participant Loader as llama_model_loader
participant GGUF as gguf_context (ggml/src/gguf.cpp)
participant Arch as llm_arch_info
participant Model as llama_model
App->>Loader: load(path or splits, params)
Loader->>GGUF: gguf_init_from_file(s)
GGUF-->>Loader: kv pairs + tensor headers
Loader->>Loader: read general.architecture
Loader->>Arch: lookup LLM_ARCH_*
Arch-->>Loader: expected tensor names + types
Loader->>Loader: read llama_hparams from kv
Loader->>Loader: build vocab via llama-vocab.cpp
Loader->>Model: allocate llama_model with tensors
Loader->>Model: copy/quantize each tensor into backend buffers
Model-->>App: readyGGUF reading itself lives in ggml/src/gguf.cpp. The loader is responsible for the higher-level "is this file consistent with the architecture I claim it is?" validation.
Splits
llama-model-loader.cpp natively understands split GGUFs (e.g. model-00001-of-00003.gguf). The split format and naming convention is shared with tools/gguf-split — see gguf-split tool.
KV overrides
Tools accept --override-kv key=type:value to patch GGUF metadata at load time. This is implemented as a list of llama_model_kv_override consulted before the loader reads each metadata key.
Integration points
- Quantization.
llama-quant.cppreuses the loader to read a source model, then writes a quantized output via a sibling writer. See Quantization. - State save/load.
src/llama-model-saver.cppwrites a model back out, used for adapter merging. - Adapters.
src/llama-adapter.cppuses the loader's GGUF helpers to read LoRA adapter files alongside the base model. - CLI. Tools usually call
llama_model_load_from_file(or_from_splits) fromcommon/common.cpp, after argument parsing incommon/arg.cpp.
Entry points for modification
- New architecture. Add an
LLM_ARCH_*enum value insrc/llama-arch.h, populate thellm_arch_infotable insrc/llama-arch.cppwith the expected tensor names, define the per-arch graph insrc/models/<your-arch>.cpp, and add a Python conversion path inconvert_hf_to_gguf.py. The full recipe isdocs/development/HOWTO-add-model.md. - New metadata key. Add the constant to
src/llama-arch.h(theLLM_KV_*enum) and a getter helper inllama-model-loader. - New tensor naming. Add the canonical name to
LLM_TENSOR_*and the per-arch override tollm_arch_info.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Library entry point
Next
Architecture switch