ggml-org/llama.cpp
Other backends
A grab bag of less-trafficked but real backends. Each has its own CODEOWNERS group; check CODEOWNERS before opening a PR against one.
Hexagon (Qualcomm DSP)
Active contributors: lhez, max-krasnyansky
ggml/src/ggml-hexagon/ runs ggml ops on the Qualcomm Hexagon DSP found on Snapdragon SoCs. It is paired with the Snapdragon NDK and the Qualcomm tooling under scripts/snapdragon/ and docs/backend/snapdragon/. Targets phone-class deployments where the CPU and GPU can be reserved for the rest of the OS.
Build: -DGGML_HEXAGON=ON plus the Qualcomm SDK on PATH.
CANN (Huawei Ascend)
Active contributors: hipudding
ggml/src/ggml-cann/ integrates with Huawei's CANN runtime to run on Ascend NPUs. Build: -DGGML_CANN=ON. Detailed instructions live under docs/backend/.
RPC
Active contributors: rgerganov
ggml/src/ggml-rpc/ is a network-transparent backend that forwards ops to a remote rpc-server. Pair the build with tools/rpc/rpc-server.cpp to run a server on a host with a real GPU and let lightweight clients offload tensors to it.
Build: -DGGML_RPC=ON. Run the server: ./build/bin/rpc-server -p 50052. Tools then accept -rpc host:port.
WebGPU
Active contributors: reeselevine
ggml/src/ggml-webgpu/ runs ggml ops via WebGPU through the Dawn/wgpu-native runtime. It targets the browser (compiled with Emscripten) and any platform with a WebGPU implementation.
Build: -DGGML_WEBGPU=ON. See docs/backend/ for browser-specific deployment.
BLAS
The BLAS "backend" is a thin matmul-only implementation that delegates to a system BLAS (OpenBLAS, MKL, Apple Accelerate). It is most useful for prompt-processing on CPU; small matrices stay in ggml-cpu.
Build: -DGGML_BLAS=ON (or -DGGML_BLAS_VENDOR=OpenBLAS|MKL|Apple|Generic). See ggml/src/ggml-blas/.
zDNN (IBM Z)
Active contributors: taronaeo, Andreas Krebbel, Aleksei Nikiforov
ggml/src/ggml-zdnn/ targets IBM Z mainframes via the zDNN library. Build: -DGGML_ZDNN=ON. See docs/build-s390x.md.
Zendnn (AMD)
ggml/src/ggml-zendnn/ is the AMD ZenDNN integration for Zen-class CPUs.
virtgpu
Active contributors: kpouget
ggml/src/ggml-virtgpu/ integrates with the Linux virtio-gpu stack, allowing a guest VM to offload to a host GPU through the virtio interface.
Why so many backends
The repo's backend story is "wherever there's a real chip with documented APIs, someone in the community will eventually port ggml to it." Each backend is a self-contained directory under ggml/src/, and the registry pattern in ggml-backend-reg.cpp means adding one mostly leaves the rest of the project untouched. CODEOWNERS makes the per-backend ownership explicit.
For per-backend op coverage see docs/ops.md.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
SYCL, OpenCL, OpenVINO backends
Next
Packages