Factory.ai

Open-Source Wikis

/

llama.cpp

/

Backends

/

Other backends

ggml-org/llama.cpp

Other backends

A grab bag of less-trafficked but real backends. Each has its own CODEOWNERS group; check CODEOWNERS before opening a PR against one.

Hexagon (Qualcomm DSP)

Active contributors: lhez, max-krasnyansky

ggml/src/ggml-hexagon/ runs ggml ops on the Qualcomm Hexagon DSP found on Snapdragon SoCs. It is paired with the Snapdragon NDK and the Qualcomm tooling under scripts/snapdragon/ and docs/backend/snapdragon/. Targets phone-class deployments where the CPU and GPU can be reserved for the rest of the OS.

Build: -DGGML_HEXAGON=ON plus the Qualcomm SDK on PATH.

CANN (Huawei Ascend)

Active contributors: hipudding

ggml/src/ggml-cann/ integrates with Huawei's CANN runtime to run on Ascend NPUs. Build: -DGGML_CANN=ON. Detailed instructions live under docs/backend/.

RPC

Active contributors: rgerganov

ggml/src/ggml-rpc/ is a network-transparent backend that forwards ops to a remote rpc-server. Pair the build with tools/rpc/rpc-server.cpp to run a server on a host with a real GPU and let lightweight clients offload tensors to it.

Build: -DGGML_RPC=ON. Run the server: ./build/bin/rpc-server -p 50052. Tools then accept -rpc host:port.

WebGPU

Active contributors: reeselevine

ggml/src/ggml-webgpu/ runs ggml ops via WebGPU through the Dawn/wgpu-native runtime. It targets the browser (compiled with Emscripten) and any platform with a WebGPU implementation.

Build: -DGGML_WEBGPU=ON. See docs/backend/ for browser-specific deployment.

BLAS

The BLAS "backend" is a thin matmul-only implementation that delegates to a system BLAS (OpenBLAS, MKL, Apple Accelerate). It is most useful for prompt-processing on CPU; small matrices stay in ggml-cpu.

Build: -DGGML_BLAS=ON (or -DGGML_BLAS_VENDOR=OpenBLAS|MKL|Apple|Generic). See ggml/src/ggml-blas/.

zDNN (IBM Z)

Active contributors: taronaeo, Andreas Krebbel, Aleksei Nikiforov

ggml/src/ggml-zdnn/ targets IBM Z mainframes via the zDNN library. Build: -DGGML_ZDNN=ON. See docs/build-s390x.md.

Zendnn (AMD)

ggml/src/ggml-zendnn/ is the AMD ZenDNN integration for Zen-class CPUs.

virtgpu

Active contributors: kpouget

ggml/src/ggml-virtgpu/ integrates with the Linux virtio-gpu stack, allowing a guest VM to offload to a host GPU through the virtio interface.

Why so many backends

The repo's backend story is "wherever there's a real chip with documented APIs, someone in the community will eventually port ggml to it." Each backend is a self-contained directory under ggml/src/, and the registry pattern in ggml-backend-reg.cpp means adding one mostly leaves the rest of the project untouched. CODEOWNERS makes the per-backend ownership explicit.

For per-backend op coverage see docs/ops.md.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Other backends – llama.cpp wiki | Factory