Factory.ai

Open-Source Wikis

/

LLVM

/

Subprojects

/

bolt

llvm/llvm-project

bolt

bolt/ is BOLT — the Binary Optimization and Layout Tool. Unlike every other component in this repository, BOLT operates on already-linked binaries: it takes a built ELF executable, a profile collected with Linux perf, and rewrites the binary with a more cache-friendly code layout. The original BOLT paper (CGO '19) reported double-digit performance gains on data-center workloads.

Purpose

By the time a binary leaves the linker, the compiler's view of "hot" code is fixed. BOLT exists because real programs spend their time differently from static heuristics' guesses, and on modern CPUs code layout — which functions sit in which cache lines, which branches are taken vs not — has measurable impact. BOLT consumes a real profile and rebuilds the binary with measured information rather than predicted information.

Directory layout

bolt/
├── include/bolt/   # Public headers
│   ├── Core/       # BinaryContext, BinaryFunction, BinaryBasicBlock
│   ├── Passes/     # Optimization passes
│   ├── Profile/    # Profile reading and aggregation
│   ├── Rewrite/    # Binary rewriting
│   ├── Target/     # Per-arch helpers (X86, AArch64, RISCV)
│   ├── Utils/
│   └── ...
├── lib/            # Implementation, mirrors include/
├── tools/          # llvm-bolt, perf2bolt, llvm-boltdiff, merge-fdata
├── test/
├── unittests/
├── docs/
│   └── OptimizingClang.md   # The canonical "use BOLT to optimize Clang itself" tutorial
├── runtime/        # The instrumentation runtime (used by `--instrument`)
├── utils/
├── examples/
├── README.md
└── CMakeLists.txt

Key abstractions

Type File Role
bolt::BinaryContext bolt/include/bolt/Core/BinaryContext.h Per-binary global state
bolt::BinaryFunction bolt/include/bolt/Core/BinaryFunction.h A function reconstructed from the binary
bolt::BinaryBasicBlock bolt/include/bolt/Core/BinaryBasicBlock.h A basic block within a BinaryFunction
bolt::DataReader / bolt::DataAggregator bolt/include/bolt/Profile/ Read perf data / fdata
bolt::RewriteInstance bolt/include/bolt/Rewrite/RewriteInstance.h Top-level orchestrator

How it works

graph LR
    bin[Linked ELF binary] --> dis[Disassemble]
    dis --> cfg[Reconstruct CFG]
    cfg --> bf[BinaryFunctions / BasicBlocks]
    perf[perf.data] --> p2b[perf2bolt]
    p2b --> fdata[BOLT profile]
    fdata --> attach[Attach profile to functions]
    bf --> attach
    attach --> opt[BOLT optimization passes]
    opt --> emit[Emit relocated code]
    emit --> out[Optimized binary]

The pipeline:

  1. Disassemble and reconstruct CFGs. Indirect jumps, jump tables, and exception-handling tables make this nontrivial — see the README's notes on input requirements.
  2. Apply the profile. Edge counts and call frequencies are attached to basic blocks and call sites.
  3. Run optimization passes. Reorder basic blocks for fall-through hot edges, split functions into hot/cold halves, reorder functions in the binary, and several smaller transforms (icf, jump-table reordering, frame-pointer reduction, ...).
  4. Re-emit. New code goes into a fresh .text.bolt section; the old code is largely retained for safety; the binary is patched and rewritten.

Input requirements

From bolt/README.md:

  • ELF binaries on X86-64 or AArch64.
  • Symbol table not stripped.
  • Linked with relocations (--emit-relocs / -q) for maximum benefit.
  • No -freorder-blocks-and-partition (GCC 8+ enables this by default; pass -fno-reorder-blocks-and-partition).

Tools

  • llvm-bolt — the main optimizer. Takes a binary, a profile, and produces an optimized binary.
  • perf2bolt — converts perf.data to BOLT's .fdata profile format. Can also be replaced with the experimental llvm-bolt -p perf.data flow.
  • llvm-boltdiff — diffs two binaries to compare BOLT's effects.
  • merge-fdata — merges multiple .fdata profiles.

Integration points

  • LLVM core libraries (libLLVM*) — BOLT links most of LLVM's MC layer for disassembly and re-emission.
  • The --instrument mode uses a small runtime under bolt/runtime/ to collect counts when perf isn't available.

Entry points for modification

  • Adding a pass: source under bolt/lib/Passes/, header under bolt/include/bolt/Passes/. Existing passes (ReorderFunctions, ReorderBasicBlocks, Inliner, FrameOptimizer) make decent templates.
  • Architecture support: per-arch glue in bolt/lib/Target/. AArch64 and RISCV are the most actively evolving as of this snapshot.

Reference

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

bolt – LLVM wiki | Factory