Factory.ai

Open-Source Wikis

/

LLVM

/

Subprojects

/

offload

llvm/llvm-project

offload

offload/ is the LLVM/Offload subproject — the runtime, tooling, and APIs used to execute code on accelerators (GPUs, FPGAs, AI/ML accelerators, distributed resources) from a host program. It is the runtime side of OpenMP target constructs, the OpenACC equivalent, and the lower layer underneath higher-level offload APIs.

Purpose

Per offload/README.md:

The Offload subproject aims at providing tooling, runtimes, and APIs that allow users to execute code on accelerators or other "co-processors" that may or may not match the architecture of their "host". In the long run, all kinds of targets are in scope of this effort, including but not limited to: CPUs, GPUs, FPGAs, AI/ML accelerators, distributed resources, etc.

For OpenMP offload users it is described as "ready and fully usable". The final API design is still being shaped.

Directory layout

offload/
├── include/        # Public headers
├── liboffload/     # Top-level offload library (the new "OL" API)
├── plugins-nextgen/# Per-target plugins — current design
│   ├── amdgpu/     # ROCm/HSA-based AMDGPU
│   ├── cuda/       # NVIDIA CUDA driver API
│   ├── host/       # Host-CPU "fallback" target
│   └── ...
├── DeviceRTL/      # Device-side runtime — implementations of `__kmpc_target_*` etc.
├── tools/          # llvm-offload-device-info, etc.
├── test/           # lit and offloading tests (driven against built artifacts)
├── unittests/
├── docs/           # User documentation
├── cmake/
└── README.md

The previous OpenMP-offload runtime lived inside openmp/libomptarget/; that code has been migrated and is now hosted here under liboffload/ and plugins-nextgen/. The migration is still finishing — both directories are alive in some configurations.

What it provides

  • The liboffload host library — the API user code links against. Includes the OpenMP __tgt_* ABI used by Clang's OpenMP lowering and the newer "OL" (offload) APIs.
  • Per-target plugins — small shared libraries that implement device discovery, image loading, kernel launch, memory allocation/copy, and synchronization for one accelerator family.
  • The DeviceRTL — a small runtime library compiled to device bitcode that supplies the device-side OpenMP runtime functions. Different functions for AMDGPU, NVPTX, etc.

How it works

graph LR
    host[Host program with #pragma omp target] --> clang[Clang]
    clang -->|host code| obj_h[Host object]
    clang -->|device code| obj_d[Device bitcode/object]
    obj_h --> hostlink[Host link]
    obj_d --> devicelink[Device link]
    devicelink --> image[Embedded device image]
    image --> hostlink
    hostlink --> bin[Fat binary]
    bin -->|run| rt[liboffload]
    rt --> plugin[Plugin (cuda / amdgpu / host)]
    plugin --> dev[Accelerator]

At runtime, the host program calls __tgt_target_kernel (or the new OL equivalents) with a kernel pointer and an argument list. liboffload picks the appropriate plugin, loads the embedded device image into the device, marshals arguments, launches the kernel, and waits for completion (or returns to async track-progress paths).

Integration points

  • OpenMP (openmp) — the host-side runtime that pairs with offload's device side. target constructs cross the seam.
  • Clang — emits the __tgt_* calls and packages device images into fat binaries via clang-linker-wrapper and clang-offload-bundler (clang/tools/).
  • Flang — emits the same offload constructs from Fortran's !$omp target and OpenACC equivalents.
  • GPU / AMDGPU / NVPTX backends in llvm/lib/Target/ — the codegen for device-side code.
  • libc (libc) — the GPU-libc mode is what device code calls when it needs printf etc.

Entry points for modification

  • Adding an accelerator target: a new plugin under offload/plugins-nextgen/<NewTarget>/ implementing the plugin interface. AMDGPU and CUDA make decent reference implementations.
  • Tuning kernel-launch path: the host-side launch code lives under offload/liboffload/.
  • Implementing a device runtime function: offload/DeviceRTL/ — bitcode that gets linked into device images.

Reference

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

offload – LLVM wiki | Factory