llvm/llvm-project
Architecture
LLVM is, at heart, a three-phase compiler with a sharp seam between front end, middle end, and back end. The seam is the LLVM intermediate representation (LLVM IR), and that abstraction is what lets dozens of language front ends share a single optimizer and codegen, and what lets a single front end target dozens of CPUs.
This page describes the umbrella architecture across the monorepo. For the per-component view, see the subprojects index.
The three-phase pipeline
graph LR
SRC["Source code<br/>(C, C++, Fortran,<br/>OpenCL, etc.)"] -->|Front end| IR
IR["LLVM IR<br/>(SSA, typed)"] -->|Optimizer<br/>passes| IR2[Optimized LLVM IR]
IR2 -->|Codegen<br/>backend| OBJ[Object code]
OBJ -->|Linker (lld/ld)| BIN[Executable]
BIN -->|"BOLT (optional)"| BIN2[Layout-optimized<br/>executable]
BIN -->|LLDB| DBG[Debugger]
BIN -->|Runtime libs| RUN[Running program]- Front end. Takes language source and produces LLVM IR. Clang (
clang/) handles the C family, Flang (flang/) handles Fortran, and many out-of-tree front ends (Rust, Swift, Julia, Halide, ...) reuse the same IR. - Middle end. A pipeline of analyses and transformations operating on LLVM IR. Lives in
llvm/lib/Transforms/andllvm/lib/Analysis/. - Back end. Lowers LLVM IR through
MachineInstr, register allocation, scheduling, and emission. Lives underllvm/lib/CodeGen/(target-independent) andllvm/lib/Target/<TargetName>/(per-target code, e.g. X86, AArch64, ARM, RISCV, AMDGPU, NVPTX, WebAssembly, PowerPC, SPIR-V, BPF, ...).
Monorepo layout
graph TD
subgraph Compilers
clang[clang]
flang[flang]
end
subgraph Tools
clang_tools[clang-tools-extra]
lldb[lldb]
bolt[bolt]
end
subgraph Linker
lld[lld]
end
subgraph Frameworks
mlir[mlir]
polly[polly]
end
subgraph Runtimes
compiler_rt[compiler-rt]
libcxx[libcxx]
libcxxabi[libcxxabi]
libunwind[libunwind]
libc[libc]
openmp[openmp]
offload[offload]
flang_rt[flang-rt]
orc_rt[orc-rt]
libsycl[libsycl]
libclc[libclc]
llvm_libgcc[llvm-libgcc]
end
llvm[llvm core]
clang -->|builds on| llvm
flang -->|builds on| llvm
flang -->|emits MLIR| mlir
mlir -->|builds on| llvm
polly -->|plugin| llvm
bolt -->|builds on| llvm
clang_tools -->|builds on| clang
lldb -->|builds on| llvm
lld -->|builds on| llvm
libcxx -->|paired with| libcxxabi
libcxxabi -->|uses| libunwindThe arrows are import direction. Everything in clang/, flang/, lld/, lldb/, mlir/, bolt/, polly/, and clang-tools-extra/ is a consumer of the LLVM core libraries. The runtime subprojects ship as separate libraries that compile alongside the toolchain.
LLVM IR — the central abstraction
LLVM IR is an SSA, strongly typed, three-address representation. It exists in three isomorphic forms:
- In-memory —
llvm::Module,llvm::Function,llvm::Instruction, etc., defined inllvm/include/llvm/IR/. - Bitcode — a compact serialized form. Read/write code lives in
llvm/lib/Bitcode/. - Textual
.ll— human-readable, parsed byllvm/lib/AsmParser/and printed byllvm/lib/IR/AsmWriter.cpp.
The pass infrastructure (the "new pass manager") in llvm/lib/Passes/ and llvm/include/llvm/IR/PassManager.h drives the optimizer.
Codegen pipeline
The back end consumes LLVM IR and produces machine code. Internally it walks through several IR levels:
graph LR
IR[LLVM IR] --> SDAG[SelectionDAG]
SDAG --> MI[MachineInstr / MachineFunction]
MI --> RA[Register allocation]
RA --> SCHED[Instruction scheduling]
SCHED --> EMIT[MC layer / object emission]
EMIT --> ASM[.s assembly]
EMIT --> OBJ[.o object file]- SelectionDAG is the legacy instruction selector (
llvm/lib/CodeGen/SelectionDAG/). - GlobalISel is the newer instruction selector framework (
llvm/lib/CodeGen/GlobalISel/) used heavily by AArch64 and AMDGPU. - The MC layer (
llvm/lib/MC/) is target-independent assembler/object-file machinery. - Per-target code lives in
llvm/lib/Target/<Name>/. Each target defines its instruction set in TableGen (*.td) and provides a<Name>ISelLowering,<Name>InstrInfo,<Name>RegisterInfo,<Name>Subtarget, etc.
TableGen
LLVM uses a domain-specific language called TableGen (llvm/lib/TableGen/, executable llvm/utils/TableGen/) to declare repetitive data: instruction encodings, register classes, calling conventions, intrinsic signatures, scheduling models, Clang attribute lists, and more. llvm-tblgen and clang-tblgen consume .td files at build time and emit C++ headers consumed by the rest of the codebase.
MLIR — the multi-level IR
MLIR is a parallel IR framework that lets compilers represent code at multiple abstraction levels in a single unified IR. It powers Flang's lowering pipeline and is widely used by ML frameworks, hardware vendors, and DSL authors. MLIR sits beside LLVM IR, not on top of it: an MLIR pipeline typically lowers from high-level dialects (e.g., linalg, tensor) down to the llvm dialect, then translates to LLVM IR for codegen.
Build system
The whole monorepo is built with CMake from llvm/ as the entry point. CMake variables select which subprojects, runtimes, and targets are built. See getting started for concrete commands and reference/configuration for the most-used CMake flags.
Testing
LLVM uses two complementary test layers:
lit-driven regression tests under each subproject'stest/directory, run viacheck-<subproject>targets.lit(llvm/utils/lit/) shells out toFileCheck-based assertions.- Unit tests under each subproject's
unittests/directory, built on Google Test (third-party/unittest/).
Cross-cutting concerns
- Coding standards:
llvm/docs/CodingStandards.rstand the patterns and conventions page. - Developer policy:
llvm/docs/DeveloperPolicy.rst. - Maintainer model: each subproject maintains its own
Maintainers.md(e.g.,llvm/Maintainers.md,clang/Maintainers.md) listing the area owners. See the maintainers page for the consolidated map.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
LLVM project overview
Next
Getting started