llvm/llvm-project
offload
offload/ is the LLVM/Offload subproject — the runtime, tooling, and APIs used to execute code on accelerators (GPUs, FPGAs, AI/ML accelerators, distributed resources) from a host program. It is the runtime side of OpenMP target constructs, the OpenACC equivalent, and the lower layer underneath higher-level offload APIs.
Purpose
Per offload/README.md:
The Offload subproject aims at providing tooling, runtimes, and APIs that allow users to execute code on accelerators or other "co-processors" that may or may not match the architecture of their "host". In the long run, all kinds of targets are in scope of this effort, including but not limited to: CPUs, GPUs, FPGAs, AI/ML accelerators, distributed resources, etc.
For OpenMP offload users it is described as "ready and fully usable". The final API design is still being shaped.
Directory layout
offload/
├── include/ # Public headers
├── liboffload/ # Top-level offload library (the new "OL" API)
├── plugins-nextgen/# Per-target plugins — current design
│ ├── amdgpu/ # ROCm/HSA-based AMDGPU
│ ├── cuda/ # NVIDIA CUDA driver API
│ ├── host/ # Host-CPU "fallback" target
│ └── ...
├── DeviceRTL/ # Device-side runtime — implementations of `__kmpc_target_*` etc.
├── tools/ # llvm-offload-device-info, etc.
├── test/ # lit and offloading tests (driven against built artifacts)
├── unittests/
├── docs/ # User documentation
├── cmake/
└── README.mdThe previous OpenMP-offload runtime lived inside openmp/libomptarget/; that code has been migrated and is now hosted here under liboffload/ and plugins-nextgen/. The migration is still finishing — both directories are alive in some configurations.
What it provides
- The
liboffloadhost library — the API user code links against. Includes the OpenMP__tgt_*ABI used by Clang's OpenMP lowering and the newer "OL" (offload) APIs. - Per-target plugins — small shared libraries that implement device discovery, image loading, kernel launch, memory allocation/copy, and synchronization for one accelerator family.
- The DeviceRTL — a small runtime library compiled to device bitcode that supplies the device-side OpenMP runtime functions. Different functions for AMDGPU, NVPTX, etc.
How it works
graph LR
host[Host program with #pragma omp target] --> clang[Clang]
clang -->|host code| obj_h[Host object]
clang -->|device code| obj_d[Device bitcode/object]
obj_h --> hostlink[Host link]
obj_d --> devicelink[Device link]
devicelink --> image[Embedded device image]
image --> hostlink
hostlink --> bin[Fat binary]
bin -->|run| rt[liboffload]
rt --> plugin[Plugin (cuda / amdgpu / host)]
plugin --> dev[Accelerator]At runtime, the host program calls __tgt_target_kernel (or the new OL equivalents) with a kernel pointer and an argument list. liboffload picks the appropriate plugin, loads the embedded device image into the device, marshals arguments, launches the kernel, and waits for completion (or returns to async track-progress paths).
Integration points
- OpenMP (
openmp) — the host-side runtime that pairs with offload's device side.targetconstructs cross the seam. - Clang — emits the
__tgt_*calls and packages device images into fat binaries viaclang-linker-wrapperandclang-offload-bundler(clang/tools/). - Flang — emits the same offload constructs from Fortran's
!$omp targetand OpenACC equivalents. - GPU / AMDGPU / NVPTX backends in
llvm/lib/Target/— the codegen for device-side code. - libc (
libc) — the GPU-libc mode is what device code calls when it needsprintfetc.
Entry points for modification
- Adding an accelerator target: a new plugin under
offload/plugins-nextgen/<NewTarget>/implementing the plugin interface. AMDGPU and CUDA make decent reference implementations. - Tuning kernel-launch path: the host-side launch code lives under
offload/liboffload/. - Implementing a device runtime function:
offload/DeviceRTL/— bitcode that gets linked into device images.
Reference
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
bolt
Next
libsycl