Open-Source Wikis

/

llama.cpp

/

Backends

/

SYCL, OpenCL, OpenVINO backends

ggml-org/llama.cpp

SYCL, OpenCL, OpenVINO backends

Active contributors: arthw (SYCL); lhez, max-krasnyansky (OpenCL); cavusmustafa, wine99 (OpenVINO)

These three backends serve overlapping but distinct ecosystems: Intel oneAPI / SYCL targets Intel GPUs and CPUs through the unified DPC++ runtime; OpenCL targets Qualcomm Adreno (and historically other GPUs); OpenVINO targets Intel's NPU and GPU through OpenVINO's runtime stack.

SYCL

Where it lives:

ggml/src/ggml-sycl/
├── CMakeLists.txt
├── ggml-sycl.cpp                # Backend entry
├── (per-op .cpp and .hpp files: mmvq, mmq, rope, fattn, ...)
└── outprod.cpp, gemv.cpp, ...

Build:

source /opt/intel/oneapi/setvars.sh    # or whatever your oneAPI install is
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
cmake --build build --config Release -j

Detailed setup is in docs/backend/SYCL.md. Tests for SYCL-specific code live under examples/sycl/.

Capabilities are similar to CUDA: most quant types, flash attention, KV cache quantization. The implementation is ported from the CUDA sources rather than written from scratch.

OpenCL

Where it lives:

ggml/src/ggml-opencl/
├── CMakeLists.txt
├── ggml-opencl.cpp              # Backend entry
├── kernels/                     # OpenCL C kernel sources
└── (build-time embedding helpers)

Build:

cmake -B build -DGGML_OPENCL=ON
cmake --build build --config Release -j

The OpenCL backend's primary target today is Qualcomm Adreno on Snapdragon devices — see docs/backend/. The historical "OpenCL backend" from the early GGML era was different and has been superseded.

OpenVINO

Where it lives:

ggml/src/ggml-openvino/
├── CMakeLists.txt
├── ggml-openvino.cpp / .h
└── (translation helpers from ggml graph to OpenVINO model)

Build:

cmake -B build -DGGML_OPENVINO=ON

OpenVINO converts the ggml computation graph into an OpenVINO ov::Model and runs it on Intel CPU / iGPU / NPU through the OpenVINO runtime. This is qualitatively different from the other backends, which all execute kernels directly. See docs/backend/OPENVINO.md if present and the per-file commentary in ggml-openvino.cpp.

Integration points

  • Scheduler. All three are ordinary ggml_backend implementations.
  • Conformance. All three are exercised by tests/test-backend-ops.
  • CODEOWNERS. SYCL → @ggml-org/ggml-sycl; OpenCL → @ggml-org/ggml-opencl; OpenVINO → cavusmustafa, wine99.

Entry points for modification

  • SYCL kernels. Mirror the structure under ggml-sycl/; many files are direct ports of CUDA .cu files.
  • OpenCL kernels. Add .cl to kernels/ and dispatch from ggml-opencl.cpp.
  • OpenVINO ops. Translation logic in ggml-openvino.cpp maps ggml_op to OpenVINO node types.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

SYCL, OpenCL, OpenVINO backends – llama.cpp wiki | Factory