Factory.ai

Open-Source Wikis

/

LLVM

/

By the numbers

llvm/llvm-project

By the numbers

Data collected on 2026-04-30 from the main branch at commit 2fdb09cf65e6.

This page is a quantitative snapshot of the llvm/llvm-project monorepo. It covers size, activity, and complexity. Numbers are approximate where rough scans (e.g., find | wc -l, wc -l) were used.

Size

  • Total source files across all top-level subprojects: roughly 186,000 files (everything tracked by git plus tests, includes, configs).
  • Total git commits: 578,857 as of this snapshot.
  • Tagged releases: 316 (LLVM tags from llvmorg-1.0.0 in 2003 through llvmorg-20.x in 2025).
  • First commit: 2001-06-06 ("New repository initialized by cvs2svn.").

Source lines by subproject

Approximate non-test, non-unittest source lines (.c, .cpp, .cc, .h, .hpp, .inc):

Subproject Approx. source lines
clang 1,893,005
llvm 873,010
lldb 752,745
mlir 744,573
polly 390,638
flang 334,658
compiler-rt 238,030
clang-tools-extra 208,635
libcxx 207,808
openmp 120,561
lld 109,511
bolt 99,376
offload 38,586
flang-rt 38,001
libc 26,349
libunwind 20,341
libclc 18,422
libcxxabi 15,066
orc-rt 7,680
libsycl 4,550
xychart-beta horizontal
    title "Source lines by subproject (thousands)"
    x-axis ["clang", "llvm", "lldb", "mlir", "polly", "flang", "compiler-rt", "clang-tools-extra", "libcxx", "openmp", "lld", "bolt", "offload", "flang-rt", "libc", "libunwind", "libclc", "libcxxabi", "orc-rt", "libsycl"]
    y-axis "Lines (thousands)" 0 --> 1900
    bar [1893, 873, 753, 745, 391, 335, 238, 209, 208, 121, 110, 99, 39, 38, 26, 20, 18, 15, 8, 5]

The top three (Clang, LLVM core, LLDB) are each in seven figures. LLDB is the surprise — it ships a lot of platform-specific debug-info handling, plus DWARF/PDB parsers, plus a Python plugin layer.

File counts by subproject

Subproject Files (incl. tests)
llvm 78,731
clang 33,584
libcxx 12,296
lldb 9,241
mlir 7,247
libc 6,931
flang 5,431
compiler-rt 4,716
lld 4,137
clang-tools-extra 3,912
polly 2,925
bolt 1,190
libclc 1,018
openmp 851
offload 828
third-party 752
cross-project-tests 311
flang-rt 254
libcxxabi 162
utils 133
orc-rt 129
libunwind 76
libsycl 73
runtimes 12
llvm-libgcc 4

LLVM core has the most files because it carries every target backend — the llvm/lib/Target/ tree alone holds dozens of architectures with their own TableGen, instruction-info, and codegen passes.

Activity

Commits per year

xychart-beta
    title "Commits per year (since 2001)"
    x-axis ["'01", "'02", "'03", "'04", "'05", "'06", "'07", "'08", "'09", "'10", "'11", "'12", "'13", "'14", "'15", "'16", "'17", "'18", "'19", "'20", "'21", "'22", "'23", "'24", "'25"]
    y-axis "Commits" 0 --> 45000
    bar [1442, 3557, 4677, 6928, 5027, 7691, 9938, 12624, 23163, 23376, 20960, 20613, 24501, 24961, 29554, 32019, 28846, 28824, 33392, 35132, 32864, 37461, 37532, 37498, 41345]

The repo has been on a long, slow upward ramp since 2001, with a step change in 2009 (commit volume roughly doubled when many tools migrated under the LLVM umbrella) and another in 2015 (when the project's contributor base broadened significantly). 2025 set a new annual record at 41,345 commits — the project has not shown any sign of slowing.

Recent activity

  • Commits in the last 90 days: 11,414.
  • Unique authors in the last 90 days: 1,412.
  • Unique authors in the last 365 days: 2,578.
  • Total unique authors all time: 7,200.

Churn hotspots (last 90 days)

The areas with the most touched files in the trailing 90 days:

Directory Files touched (90d)
llvm/test 15,717
llvm/lib 8,150
clang/test 6,578
clang/lib 3,867
lldb/source 1,981
libclc/clc 1,912
mlir/test 1,838
mlir/lib 1,540
libcxx/test 1,504
libc/src 1,482
llvm/include 1,399
clang/include 1,384
lldb/test 1,217
flang/test 1,163
libclc/opencl 867

Tests drive the top of the list because every behavior change ships with regression coverage — that ratio is one of the project's strongest cultural signals.

Bot-attributed commits

Out of 578,857 total commits, 8,018 (≈ 1.4%) carry an email matching [bot]@, noreply.github.com, dependabot, or github-actions — heuristically, bot or automation activity. The last 90 days alone show 0 explicit Co-authored-by: trailers from bots. This is a lower bound; inline AI tools (Copilot, etc.) leave no trace in commit metadata, and the project's strict commit-message style minimizes machine-generated trailers. Treat this number as an indicator of automation pipelines, not of AI-assisted authorship.

Complexity

  • Targets in llvm/lib/Target/: dozens (X86, AArch64, ARM, RISCV, AMDGPU, NVPTX, WebAssembly, PowerPC, Mips, Hexagon, SPARC, SystemZ, BPF, AVR, Lanai, MSP430, Sparc, VE, XCore, M68k, LoongArch, CSKY, DirectX, SPIRV, Xtensa, ARC, plus several test-only targets).
  • Subprojects directly under the repo root: 24 (counting bolt, clang, clang-tools-extra, cross-project-tests, compiler-rt, flang, flang-rt, libc, libclc, libcxx, libcxxabi, libsycl, libunwind, lld, lldb, llvm, llvm-libgcc, mlir, offload, openmp, orc-rt, polly, runtimes, third-party, utils).
  • Top-level lib/ directories in LLVM core: 56 (Analysis, AsmParser, BinaryFormat, Bitcode, CAS, CodeGen, DebugInfo, ExecutionEngine, IR, MC, Object, Support, TableGen, Target, Transforms, plus many more — see llvm/lib/).

Reading these numbers

These are scope indicators, not quality metrics. LLDB and Polly look large mostly because they carry duplicated tables (LLDB ships per-platform debug-info parsers; Polly imports parts of the Integer Set Library). Clang's line count includes generated headers from TableGen. The interesting numbers are the activity ones — they show a project with a wide and steady contributor base, sustained release cadence, and a healthy test-to-code ratio.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

By the numbers – LLVM wiki | Factory