llvm/llvm-project
bolt
bolt/ is BOLT — the Binary Optimization and Layout Tool. Unlike every other component in this repository, BOLT operates on already-linked binaries: it takes a built ELF executable, a profile collected with Linux perf, and rewrites the binary with a more cache-friendly code layout. The original BOLT paper (CGO '19) reported double-digit performance gains on data-center workloads.
Purpose
By the time a binary leaves the linker, the compiler's view of "hot" code is fixed. BOLT exists because real programs spend their time differently from static heuristics' guesses, and on modern CPUs code layout — which functions sit in which cache lines, which branches are taken vs not — has measurable impact. BOLT consumes a real profile and rebuilds the binary with measured information rather than predicted information.
Directory layout
bolt/
├── include/bolt/ # Public headers
│ ├── Core/ # BinaryContext, BinaryFunction, BinaryBasicBlock
│ ├── Passes/ # Optimization passes
│ ├── Profile/ # Profile reading and aggregation
│ ├── Rewrite/ # Binary rewriting
│ ├── Target/ # Per-arch helpers (X86, AArch64, RISCV)
│ ├── Utils/
│ └── ...
├── lib/ # Implementation, mirrors include/
├── tools/ # llvm-bolt, perf2bolt, llvm-boltdiff, merge-fdata
├── test/
├── unittests/
├── docs/
│ └── OptimizingClang.md # The canonical "use BOLT to optimize Clang itself" tutorial
├── runtime/ # The instrumentation runtime (used by `--instrument`)
├── utils/
├── examples/
├── README.md
└── CMakeLists.txtKey abstractions
| Type | File | Role |
|---|---|---|
bolt::BinaryContext |
bolt/include/bolt/Core/BinaryContext.h |
Per-binary global state |
bolt::BinaryFunction |
bolt/include/bolt/Core/BinaryFunction.h |
A function reconstructed from the binary |
bolt::BinaryBasicBlock |
bolt/include/bolt/Core/BinaryBasicBlock.h |
A basic block within a BinaryFunction |
bolt::DataReader / bolt::DataAggregator |
bolt/include/bolt/Profile/ |
Read perf data / fdata |
bolt::RewriteInstance |
bolt/include/bolt/Rewrite/RewriteInstance.h |
Top-level orchestrator |
How it works
graph LR
bin[Linked ELF binary] --> dis[Disassemble]
dis --> cfg[Reconstruct CFG]
cfg --> bf[BinaryFunctions / BasicBlocks]
perf[perf.data] --> p2b[perf2bolt]
p2b --> fdata[BOLT profile]
fdata --> attach[Attach profile to functions]
bf --> attach
attach --> opt[BOLT optimization passes]
opt --> emit[Emit relocated code]
emit --> out[Optimized binary]The pipeline:
- Disassemble and reconstruct CFGs. Indirect jumps, jump tables, and exception-handling tables make this nontrivial — see the README's notes on input requirements.
- Apply the profile. Edge counts and call frequencies are attached to basic blocks and call sites.
- Run optimization passes. Reorder basic blocks for fall-through hot edges, split functions into hot/cold halves, reorder functions in the binary, and several smaller transforms (icf, jump-table reordering, frame-pointer reduction, ...).
- Re-emit. New code goes into a fresh
.text.boltsection; the old code is largely retained for safety; the binary is patched and rewritten.
Input requirements
From bolt/README.md:
- ELF binaries on X86-64 or AArch64.
- Symbol table not stripped.
- Linked with relocations (
--emit-relocs/-q) for maximum benefit. - No
-freorder-blocks-and-partition(GCC 8+ enables this by default; pass-fno-reorder-blocks-and-partition).
Tools
llvm-bolt— the main optimizer. Takes a binary, a profile, and produces an optimized binary.perf2bolt— convertsperf.datato BOLT's.fdataprofile format. Can also be replaced with the experimentalllvm-bolt -p perf.dataflow.llvm-boltdiff— diffs two binaries to compare BOLT's effects.merge-fdata— merges multiple.fdataprofiles.
Integration points
- LLVM core libraries (
libLLVM*) — BOLT links most of LLVM's MC layer for disassembly and re-emission. - The
--instrumentmode uses a small runtime underbolt/runtime/to collect counts whenperfisn't available.
Entry points for modification
- Adding a pass: source under
bolt/lib/Passes/, header underbolt/include/bolt/Passes/. Existing passes (ReorderFunctions,ReorderBasicBlocks,Inliner,FrameOptimizer) make decent templates. - Architecture support: per-arch glue in
bolt/lib/Target/. AArch64 and RISCV are the most actively evolving as of this snapshot.
Reference
- BOLT documentation
- Optimizing Clang with BOLT — the canonical end-to-end tutorial
- BOLT paper (CGO '19)
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
compiler-rt
Next
offload