ggml-org/llama.cpp
Vulkan backend
Active contributors: 0cc4m, Jeff Bolz
The Vulkan backend is the cross-vendor GPU path: it runs on NVIDIA, AMD, Intel, and any other Vulkan-capable hardware with a single binary. It is the most portable GPU option, especially on Linux and Windows where vendor stacks differ.
Where it lives
ggml/src/ggml-vulkan/
├── CMakeLists.txt
├── ggml-vulkan.cpp # Backend entry: device, queues, dispatch, buffers
├── ggml-vulkan-shaders.cpp / .hpp # Generated shader registry
├── vulkan-shaders/ # GLSL/SPIR-V kernel sources
└── (cmake fragments that compile shaders at build time)Shader sources are compiled to SPIR-V at build time (glslc from the Vulkan SDK) and embedded into the binary. The C++ side handles device discovery, descriptor sets, command buffers, and pipeline creation.
Capabilities
- Full transformer op set on most modern GPUs.
- All
ggml_types used at inference time (k-quants, IQ-quants, MXFP4, legacy block, FP16, BF16). - Multi-GPU via tensor split.
- Shader-based flash attention path on supporting GPUs.
- Cross-vendor: same binary runs on NVIDIA, AMD, Intel, Adreno (mobile), Mali, etc.
Build
cmake -B build -DGGML_VULKAN=ON
cmake --build build --config Release -jRequires the Vulkan SDK at build time. Detailed instructions in docs/build.md#vulkan and docs/backend/.
Performance notes
- Vulkan competes well with vendor-specific backends on flagship GPUs and is often the best option on AMD due to ROCm's narrower hardware coverage.
- On older / mobile Vulkan implementations, expect slower flash-attention paths or fallbacks.
- Pipeline cache files (
pipeline_cache.bin) speed up subsequent launches; the backend writes them under the user's cache dir by default.
Integration points
- Scheduler. Standard
ggml_backend_schedintegration. - Server. Same flags as CUDA:
--n-gpu-layers,--tensor-split,--main-gpu,-fa. - Mobile. Vulkan is increasingly viable on Adreno/Mali; see also the Hexagon backend for Qualcomm DSP offload.
Entry points for modification
- New shader. Add
vulkan-shaders/<op>.comp(GLSL), wire up the dispatch inggml-vulkan.cpp. The shader registry is generated at build time. - Pipeline tuning.
ggml-vulkan.cppowns the pipeline-creation logic and cache. - New device extension. Probe in
ggml_backend_vulkan_init, gate kernels accordingly.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Metal backend
Next
SYCL, OpenCL, OpenVINO backends