Factory.ai

Open-Source Wikis

/

LLVM

/

LLVM project overview

/

Architecture

llvm/llvm-project

Architecture

LLVM is, at heart, a three-phase compiler with a sharp seam between front end, middle end, and back end. The seam is the LLVM intermediate representation (LLVM IR), and that abstraction is what lets dozens of language front ends share a single optimizer and codegen, and what lets a single front end target dozens of CPUs.

This page describes the umbrella architecture across the monorepo. For the per-component view, see the subprojects index.

The three-phase pipeline

graph LR
    SRC["Source code<br/>(C, C++, Fortran,<br/>OpenCL, etc.)"] -->|Front end| IR
    IR["LLVM IR<br/>(SSA, typed)"] -->|Optimizer<br/>passes| IR2[Optimized LLVM IR]
    IR2 -->|Codegen<br/>backend| OBJ[Object code]
    OBJ -->|Linker (lld/ld)| BIN[Executable]
    BIN -->|"BOLT (optional)"| BIN2[Layout-optimized<br/>executable]
    BIN -->|LLDB| DBG[Debugger]
    BIN -->|Runtime libs| RUN[Running program]
  • Front end. Takes language source and produces LLVM IR. Clang (clang/) handles the C family, Flang (flang/) handles Fortran, and many out-of-tree front ends (Rust, Swift, Julia, Halide, ...) reuse the same IR.
  • Middle end. A pipeline of analyses and transformations operating on LLVM IR. Lives in llvm/lib/Transforms/ and llvm/lib/Analysis/.
  • Back end. Lowers LLVM IR through MachineInstr, register allocation, scheduling, and emission. Lives under llvm/lib/CodeGen/ (target-independent) and llvm/lib/Target/<TargetName>/ (per-target code, e.g. X86, AArch64, ARM, RISCV, AMDGPU, NVPTX, WebAssembly, PowerPC, SPIR-V, BPF, ...).

Monorepo layout

graph TD
    subgraph Compilers
        clang[clang]
        flang[flang]
    end
    subgraph Tools
        clang_tools[clang-tools-extra]
        lldb[lldb]
        bolt[bolt]
    end
    subgraph Linker
        lld[lld]
    end
    subgraph Frameworks
        mlir[mlir]
        polly[polly]
    end
    subgraph Runtimes
        compiler_rt[compiler-rt]
        libcxx[libcxx]
        libcxxabi[libcxxabi]
        libunwind[libunwind]
        libc[libc]
        openmp[openmp]
        offload[offload]
        flang_rt[flang-rt]
        orc_rt[orc-rt]
        libsycl[libsycl]
        libclc[libclc]
        llvm_libgcc[llvm-libgcc]
    end
    llvm[llvm core]
    clang -->|builds on| llvm
    flang -->|builds on| llvm
    flang -->|emits MLIR| mlir
    mlir -->|builds on| llvm
    polly -->|plugin| llvm
    bolt -->|builds on| llvm
    clang_tools -->|builds on| clang
    lldb -->|builds on| llvm
    lld -->|builds on| llvm
    libcxx -->|paired with| libcxxabi
    libcxxabi -->|uses| libunwind

The arrows are import direction. Everything in clang/, flang/, lld/, lldb/, mlir/, bolt/, polly/, and clang-tools-extra/ is a consumer of the LLVM core libraries. The runtime subprojects ship as separate libraries that compile alongside the toolchain.

LLVM IR — the central abstraction

LLVM IR is an SSA, strongly typed, three-address representation. It exists in three isomorphic forms:

The pass infrastructure (the "new pass manager") in llvm/lib/Passes/ and llvm/include/llvm/IR/PassManager.h drives the optimizer.

Codegen pipeline

The back end consumes LLVM IR and produces machine code. Internally it walks through several IR levels:

graph LR
    IR[LLVM IR] --> SDAG[SelectionDAG]
    SDAG --> MI[MachineInstr / MachineFunction]
    MI --> RA[Register allocation]
    RA --> SCHED[Instruction scheduling]
    SCHED --> EMIT[MC layer / object emission]
    EMIT --> ASM[.s assembly]
    EMIT --> OBJ[.o object file]
  • SelectionDAG is the legacy instruction selector (llvm/lib/CodeGen/SelectionDAG/).
  • GlobalISel is the newer instruction selector framework (llvm/lib/CodeGen/GlobalISel/) used heavily by AArch64 and AMDGPU.
  • The MC layer (llvm/lib/MC/) is target-independent assembler/object-file machinery.
  • Per-target code lives in llvm/lib/Target/<Name>/. Each target defines its instruction set in TableGen (*.td) and provides a <Name>ISelLowering, <Name>InstrInfo, <Name>RegisterInfo, <Name>Subtarget, etc.

TableGen

LLVM uses a domain-specific language called TableGen (llvm/lib/TableGen/, executable llvm/utils/TableGen/) to declare repetitive data: instruction encodings, register classes, calling conventions, intrinsic signatures, scheduling models, Clang attribute lists, and more. llvm-tblgen and clang-tblgen consume .td files at build time and emit C++ headers consumed by the rest of the codebase.

MLIR — the multi-level IR

MLIR is a parallel IR framework that lets compilers represent code at multiple abstraction levels in a single unified IR. It powers Flang's lowering pipeline and is widely used by ML frameworks, hardware vendors, and DSL authors. MLIR sits beside LLVM IR, not on top of it: an MLIR pipeline typically lowers from high-level dialects (e.g., linalg, tensor) down to the llvm dialect, then translates to LLVM IR for codegen.

Build system

The whole monorepo is built with CMake from llvm/ as the entry point. CMake variables select which subprojects, runtimes, and targets are built. See getting started for concrete commands and reference/configuration for the most-used CMake flags.

Testing

LLVM uses two complementary test layers:

  • lit-driven regression tests under each subproject's test/ directory, run via check-<subproject> targets. lit (llvm/utils/lit/) shells out to FileCheck-based assertions.
  • Unit tests under each subproject's unittests/ directory, built on Google Test (third-party/unittest/).

Cross-cutting concerns

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Architecture – LLVM wiki | Factory