Factory.ai

Open-Source Wikis

/

LLVM

/

Fun facts

llvm/llvm-project

Fun facts

A few things you only learn after spending real time in this codebase.

The repo is older than YouTube

The first commit in the LLVM history is dated 2001-06-06. That predates Wikipedia going public (Jan 2001 was very early), the iPod (Oct 2001), Mac OS X 10.1 (Sep 2001), and YouTube (Feb 2005) — and it predates the LLVM 1.0.0 release (Oct 2003) by more than two years. The first commit message reads, in its entirety:

New repository initialized by cvs2svn.

The repo has lived through CVS → SVN → SVN-monorepo → git-monorepo migrations. A few commits from 2001 still live in the history with their original timestamps.

A quarter-million TODOs

A naive grep for TODO, FIXME, HACK, or XXX across all C, C++, headers, and inline files in the repository turns up roughly 25,000 hits. That's about one TODO every six source files. A surprising fraction date back to the 2000s — the IR parser, the Support library, and the early SelectionDAG code carry their original notes from when there were five contributors instead of seven thousand.

41,000+ commits in a single year

2025 set a new annual record at 41,345 commits. That's roughly 113 commits per day, every day, for a year. Even allowing for time-zone smear, multiple commits land per hour around the clock.

clang is bigger than llvm

By non-test source-line count, Clang has roughly 1.9 million lines vs. LLVM core's 873,000. Most of the asymmetry comes from generated headers (Clang's TableGen output for diagnostics, attributes, builtins, etc.) and from the staggering surface area of the C++ language standard, which Clang implements piecewise.

LLDB is the third-largest subproject

A surprise to most people: LLDB at ~750k lines is bigger than MLIR (~745k), both of them are bigger than flang (~335k), and all three together don't quite reach Clang's size. LLDB carries per-platform ABI plugins, DWARF and PDB parsers, a Python script bridge, a remote debugging stub, a GDB-remote protocol implementation, and a host of platform glue.

Polly imports a math library

The polly/ subproject uses polyhedral compilation built on top of the Integer Set Library (ISL), a third-party library that lives bundled inside polly/lib/External/. That library and its headers account for a meaningful chunk of Polly's reported 390k source lines.

llvm-bolt operates on shipped binaries

Most LLVM components live and die before the linker runs. BOLT (bolt/) is the exception: it operates on the output of the linker, rewriting code layout based on perf data after the program has been built. The original BOLT paper reported double-digit speedups on Meta's largest binaries.

TableGen is a programming language

LLVM's .td files aren't just data tables. TableGen has classes, multiclasses, inheritance, conditional expressions, and a foreach loop. It is a domain-specific language with its own parser, type system, and interpreter — all built into the build process. The interpreter lives in llvm/lib/TableGen/ and the front-end backends that consume it live in llvm/utils/TableGen/ (and clang/utils/TableGen/, and mlir/tools/...).

The same IR text written in 2003 still parses

LLVM IR's textual format has been remarkably stable. A simple define i32 @main() { ret i32 0 } written against an early-2000s build will parse against today's opt. The bitcode format has changed more (LLVM goes through periodic compatibility flushes), but the textual .ll form is one of the most stable IRs in compiler history.

"lit" stands for "LLVM Integrated Tester"

The two-letter test driver name has tripped up generations of contributors. It is lit, lower-case, and it lives in llvm/utils/lit/ as a Python package. It pairs with FileCheck (llvm/utils/FileCheck/), which does the actual pattern matching against compiler output.

The IR header layout was reorganized in 2013

The current location of the LLVM IR headers — llvm/include/llvm/IR/ — only dates to January 2013. Before that, IR headers lived directly in llvm/include/llvm/. The reorganization commit explains it cleanly:

Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM.

That commit alone touched thousands of files.

The repository has survived three VCS systems

CVS (2001 – early 2000s) → SVN (mid-2000s – 2019) → git monorepo (2019 – present). Each migration preserved authorship and commit messages. You can git log your way back to commits whose original metadata predates GitHub itself.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Fun facts – LLVM wiki | Factory