ggml-org/llama.cpp
Metal backend
Active contributors: Georgi Gerganov
The Metal backend runs on Apple Silicon (M-series chips) and Intel Macs with Metal-capable GPUs. It is on by default on macOS and is one of the project's most polished accelerator paths — Apple Silicon was a first-class target from very early in the project.
Where it lives
ggml/src/ggml-metal/
├── CMakeLists.txt
├── ggml-metal.h, ggml-metal.m, ggml-metal.mm # Backend Objective-C++ entry
├── ggml-metal-impl.h # Internal declarations
├── ggml-metal.metal # All Metal compute shaders (~10k LOC)
├── ggml-metal-common.h
└── (ancillary helpers)Most kernels live in a single big .metal file. The Objective-C side wraps the Metal API: device init, command queue, buffer management, and per-op dispatch.
Capabilities
- Full transformer op set including flash attention.
- All quant types relevant for inference (k-quants, IQ-quants, MXFP4, legacy block, FP16, BF16).
- Quantized KV cache.
- Unified memory architecture is exploited — large weights stay in shared memory rather than being copied across PCIe.
- macOS, iOS/iPadOS, and Apple TV builds (via the
build-xcframework.shscript).
Build
cmake -B build # Metal is auto-enabled on Apple platforms
cmake --build build --config Release -jFor iOS/iPadOS distribution, build-xcframework.sh produces an XCFramework. See docs/android.md and the SwiftUI demo at examples/llama.swiftui/ for mobile integration patterns.
Performance notes
- Apple Silicon has unified memory: load a model and the GPU sees the same bytes — no
cudaMemcpyequivalent required. - M-series Neural Engine is not used — the backend runs on the GPU. Some prompt-processing paths use the BLAS backend (Apple Accelerate) when CMake finds it.
-fa 1enables flash attention.-ctk q8_0 -ctv q8_0halves KV-cache memory at small quality cost.
Integration points
- Scheduler. Single-device by default; multi-Metal-device setups are uncommon but supported.
build-xcframework.sh— produces a redistributable XCFramework for iOS/macOS apps.examples/llama.swiftui/,examples/batched.swift/— Swift integration references.
Entry points for modification
- New shader. Add a kernel to
ggml-metal.metal, declare it inggml-metal-impl.h, dispatch inggml-metal.m. Test against CPU viatests/test-backend-ops. - API change. Editing the Obj-C++
.m/.mmfiles; keep them ARC-clean. - iOS-specific.
build-xcframework.shis the single source of truth for the iOS build; changes go there.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
CUDA, HIP, MUSA backends
Next
Vulkan backend