Factory.ai

Open-Source Wikis

/

LLVM

/

Subprojects

/

clang

llvm/llvm-project

clang

Clang is the LLVM project's C-family front end: a production compiler for C, C++, Objective-C, Objective-C++, OpenCL C/C++, CUDA host code, HIP, SYCL host code, and several more. It produces LLVM IR (and increasingly MLIR via the CIR dialect) that the LLVM core then optimizes and codegen-s.

Purpose

Clang exists to be a fast, GCC-compatible, library-friendly C-family compiler. Its three defining design choices versus older C/C++ front ends are:

  1. A reusable library architecture — every stage (lex, parse, sema, AST, codegen) is a library other tools can link against. clangd, clang-tidy, clang-format, IDE plugins, and most modern C/C++ tooling are built on Clang's libraries, not on running the executable as a black box.
  2. A high-fidelity AST — Clang's AST preserves source locations, macros, sugar, templates, and the original token stream to a level no other C++ front end matches. That fidelity is what makes the tooling ecosystem possible.
  3. GCC compatibility as a feature — Clang implements GCC extensions, attribute syntax, and command-line flags so Linux distributions can switch between the two compilers.

Directory layout

clang/
├── include/clang/        # Public headers
├── lib/                  # Implementation
│   ├── Lex/              # Preprocessor and lexer
│   ├── Parse/            # Parser
│   ├── AST/              # AST nodes, AST context, type system
│   ├── ASTMatchers/      # AST matcher DSL
│   ├── Basic/            # Source manager, diagnostics infrastructure
│   ├── Sema/             # Semantic analysis (the bulk of C++ semantics)
│   ├── CodeGen/          # AST → LLVM IR
│   ├── CIR/              # AST → MLIR (Clang IR) — newer pipeline
│   ├── Frontend/         # Top-level FrontendAction; -cc1
│   ├── Driver/           # The user-facing `clang` tool: argv → -cc1 invocations
│   ├── Tooling/          # libclangTooling — drives Clang as a library
│   ├── Analysis/         # Static analyses (CFG, dataflow)
│   ├── StaticAnalyzer/   # The clang static analyzer
│   ├── Format/           # libclangFormat — used by clang-format
│   ├── Index/            # Symbol indexing for tooling
│   ├── Modules/          # C/C++ modules
│   ├── Serialization/    # Precompiled headers and module files
│   ├── Headers/          # Compiler-supplied headers (intrinsics)
│   └── ExtractAPI/, InstallAPI/, Interpreter/, Rewrite/, ARCMigrate/, ...
├── tools/                # Drivers: `clang`, `clang-cc1`, `clang-format`, `clang-import-test`, ...
├── test/                 # lit regression tests
├── unittests/            # Google-Test unit tests
├── docs/                 # Sphinx documentation source
├── www/                  # Public website source (clang.llvm.org)
├── examples/             # Plugins and tooling examples
├── utils/                # clang-tblgen and helpers
├── runtime/              # Runtime build glue
├── Maintainers.md
└── CMakeLists.txt

Key abstractions

Type File Role
clang::ASTContext clang/include/clang/AST/ASTContext.h Owns all AST nodes and types
clang::Decl / clang::Stmt / clang::Expr clang/include/clang/AST/ AST node hierarchies
clang::QualType / clang::Type clang/include/clang/AST/Type.h The C/C++/Objective-C type system
clang::Sema clang/include/clang/Sema/Sema.h Semantic-analysis state machine
clang::Preprocessor clang/include/clang/Lex/Preprocessor.h Macro expansion, file inclusion
clang::Parser clang/include/clang/Parse/Parser.h Hand-written recursive-descent parser
clang::CodeGen::CodeGenModule clang/lib/CodeGen/CodeGenModule.h AST → LLVM IR
clang::driver::Driver clang/include/clang/Driver/Driver.h argv → tool invocations
clang::FrontendAction clang/include/clang/Frontend/FrontendAction.h Plug-in point for "what to do with the parsed AST"
clang::tooling::ClangTool clang/include/clang/Tooling/Tooling.h Library entry point for tools
clang::ASTMatchFinder clang/include/clang/ASTMatchers/ASTMatchFinder.h AST-matcher driver

How it works

graph LR
    src[Source file] --> driver[clang driver]
    driver -->|spawns| cc1[clang -cc1]
    cc1 --> lex[Lex / Preprocess]
    lex --> parse[Parse]
    parse --> sema[Sema]
    sema --> ast[AST]
    ast --> codegen[CodeGen]
    codegen --> ir[LLVM IR]
    ir --> opt[opt pipeline]
    opt --> backend["LLVM CodeGen<br/>(per-target)"]
    backend --> obj[Object]
    driver --> link[Linker / Driver runs ld/lld]
    obj --> link
    link --> exe[Executable]

    ast -.->|"--emit-cir"| cir[CIR / MLIR]
    cir --> ir
  • The driver (clang/lib/Driver/) is the thing the user invokes. It interprets argv, infers a toolchain (Toolchain.cpp per platform), and decides which sub-actions to spawn (preprocess, compile, assemble, link). Sub-actions become clang -cc1 invocations.
  • clang -cc1 (clang/tools/driver/ and clang/lib/FrontendTool/) is the actual compiler. It runs lex → parse → sema → codegen → optimization → emission.
  • Sema (clang/lib/Sema/) is the largest single library in Clang. It does name lookup, overload resolution, template instantiation, type checking, attribute handling, conversion sequences, and the rest of C++.
  • CodeGen (clang/lib/CodeGen/) translates AST to LLVM IR. It handles ABI lowering, exception handling, RTTI, vtables, OpenMP, OpenACC, CUDA host stubs, HIP, sanitizer instrumentation, and source-level debug info.
  • CIR (clang/lib/CIR/) is the newer MLIR-based pipeline that sits between AST and LLVM IR. It is being landed incrementally — see recent commits with [CIR] prefixes.

Diagnostics, attributes, and TableGen

Clang uses TableGen heavily. The big .td files:

The clang/utils/TableGen/ directory provides backends that consume those .td files and emit C++.

Tools

clang/tools/ holds drivers and library-frontends:

  • clang/ — the user driver
  • clang-fuzzer/, clang-import-test/, clang-format/, clang-installapi/, clang-linker-wrapper/, clang-nvlink-wrapper/, clang-offload-bundler/, clang-offload-packager/, clang-refactor/, clang-rename/, clang-repl/, clang-scan-deps/, clang-shlib/, c-arcmt-test/, c-index-test/, diagtool/, driver/, libclang/

libclang (clang/tools/libclang/) is the C-stable API used by language bindings (Python, Ruby, ...).

Static analyzer

The static analyzer (clang/lib/StaticAnalyzer/) is a path-sensitive symbolic execution engine for C/C++/Objective-C. It implements custom checkers (e.g., null dereferences, use-after-free, malloc/free pairing, taint analysis). Documentation in clang/docs/analyzer/.

Integration points

  • Emits LLVM IR consumed by llvm for optimization and codegen.
  • Reads pre-compiled headers and modules via clang/lib/Serialization/.
  • Powers most of clang-tools-extra (clangd, clang-tidy, clang-doc, etc.) via the clang::tooling library.
  • The clang::Format library is the engine behind clang-format.

Entry points for modification

Reference

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

clang – LLVM wiki | Factory