llvm/llvm-project
clang
Clang is the LLVM project's C-family front end: a production compiler for C, C++, Objective-C, Objective-C++, OpenCL C/C++, CUDA host code, HIP, SYCL host code, and several more. It produces LLVM IR (and increasingly MLIR via the CIR dialect) that the LLVM core then optimizes and codegen-s.
Purpose
Clang exists to be a fast, GCC-compatible, library-friendly C-family compiler. Its three defining design choices versus older C/C++ front ends are:
- A reusable library architecture — every stage (lex, parse, sema, AST, codegen) is a library other tools can link against. clangd, clang-tidy, clang-format, IDE plugins, and most modern C/C++ tooling are built on Clang's libraries, not on running the executable as a black box.
- A high-fidelity AST — Clang's AST preserves source locations, macros, sugar, templates, and the original token stream to a level no other C++ front end matches. That fidelity is what makes the tooling ecosystem possible.
- GCC compatibility as a feature — Clang implements GCC extensions, attribute syntax, and command-line flags so Linux distributions can switch between the two compilers.
Directory layout
clang/
├── include/clang/ # Public headers
├── lib/ # Implementation
│ ├── Lex/ # Preprocessor and lexer
│ ├── Parse/ # Parser
│ ├── AST/ # AST nodes, AST context, type system
│ ├── ASTMatchers/ # AST matcher DSL
│ ├── Basic/ # Source manager, diagnostics infrastructure
│ ├── Sema/ # Semantic analysis (the bulk of C++ semantics)
│ ├── CodeGen/ # AST → LLVM IR
│ ├── CIR/ # AST → MLIR (Clang IR) — newer pipeline
│ ├── Frontend/ # Top-level FrontendAction; -cc1
│ ├── Driver/ # The user-facing `clang` tool: argv → -cc1 invocations
│ ├── Tooling/ # libclangTooling — drives Clang as a library
│ ├── Analysis/ # Static analyses (CFG, dataflow)
│ ├── StaticAnalyzer/ # The clang static analyzer
│ ├── Format/ # libclangFormat — used by clang-format
│ ├── Index/ # Symbol indexing for tooling
│ ├── Modules/ # C/C++ modules
│ ├── Serialization/ # Precompiled headers and module files
│ ├── Headers/ # Compiler-supplied headers (intrinsics)
│ └── ExtractAPI/, InstallAPI/, Interpreter/, Rewrite/, ARCMigrate/, ...
├── tools/ # Drivers: `clang`, `clang-cc1`, `clang-format`, `clang-import-test`, ...
├── test/ # lit regression tests
├── unittests/ # Google-Test unit tests
├── docs/ # Sphinx documentation source
├── www/ # Public website source (clang.llvm.org)
├── examples/ # Plugins and tooling examples
├── utils/ # clang-tblgen and helpers
├── runtime/ # Runtime build glue
├── Maintainers.md
└── CMakeLists.txtKey abstractions
| Type | File | Role |
|---|---|---|
clang::ASTContext |
clang/include/clang/AST/ASTContext.h |
Owns all AST nodes and types |
clang::Decl / clang::Stmt / clang::Expr |
clang/include/clang/AST/ |
AST node hierarchies |
clang::QualType / clang::Type |
clang/include/clang/AST/Type.h |
The C/C++/Objective-C type system |
clang::Sema |
clang/include/clang/Sema/Sema.h |
Semantic-analysis state machine |
clang::Preprocessor |
clang/include/clang/Lex/Preprocessor.h |
Macro expansion, file inclusion |
clang::Parser |
clang/include/clang/Parse/Parser.h |
Hand-written recursive-descent parser |
clang::CodeGen::CodeGenModule |
clang/lib/CodeGen/CodeGenModule.h |
AST → LLVM IR |
clang::driver::Driver |
clang/include/clang/Driver/Driver.h |
argv → tool invocations |
clang::FrontendAction |
clang/include/clang/Frontend/FrontendAction.h |
Plug-in point for "what to do with the parsed AST" |
clang::tooling::ClangTool |
clang/include/clang/Tooling/Tooling.h |
Library entry point for tools |
clang::ASTMatchFinder |
clang/include/clang/ASTMatchers/ASTMatchFinder.h |
AST-matcher driver |
How it works
graph LR
src[Source file] --> driver[clang driver]
driver -->|spawns| cc1[clang -cc1]
cc1 --> lex[Lex / Preprocess]
lex --> parse[Parse]
parse --> sema[Sema]
sema --> ast[AST]
ast --> codegen[CodeGen]
codegen --> ir[LLVM IR]
ir --> opt[opt pipeline]
opt --> backend["LLVM CodeGen<br/>(per-target)"]
backend --> obj[Object]
driver --> link[Linker / Driver runs ld/lld]
obj --> link
link --> exe[Executable]
ast -.->|"--emit-cir"| cir[CIR / MLIR]
cir --> ir- The driver (
clang/lib/Driver/) is the thing the user invokes. It interpretsargv, infers a toolchain (Toolchain.cppper platform), and decides which sub-actions to spawn (preprocess, compile, assemble, link). Sub-actions becomeclang -cc1invocations. clang -cc1(clang/tools/driver/andclang/lib/FrontendTool/) is the actual compiler. It runs lex → parse → sema → codegen → optimization → emission.- Sema (
clang/lib/Sema/) is the largest single library in Clang. It does name lookup, overload resolution, template instantiation, type checking, attribute handling, conversion sequences, and the rest of C++. - CodeGen (
clang/lib/CodeGen/) translates AST to LLVM IR. It handles ABI lowering, exception handling, RTTI, vtables, OpenMP, OpenACC, CUDA host stubs, HIP, sanitizer instrumentation, and source-level debug info. - CIR (
clang/lib/CIR/) is the newer MLIR-based pipeline that sits between AST and LLVM IR. It is being landed incrementally — see recent commits with[CIR]prefixes.
Diagnostics, attributes, and TableGen
Clang uses TableGen heavily. The big .td files:
clang/include/clang/Basic/Diagnostic*.td— every diagnostic the compiler can emitclang/include/clang/Basic/Attr.td— every attribute (GNU, MS, C++ standard, Clang-specific)clang/include/clang/Basic/Builtins*.td— builtinsclang/include/clang/Driver/Options.td— every command-line option
The clang/utils/TableGen/ directory provides backends that consume those .td files and emit C++.
Tools
clang/tools/ holds drivers and library-frontends:
clang/— the user driverclang-fuzzer/,clang-import-test/,clang-format/,clang-installapi/,clang-linker-wrapper/,clang-nvlink-wrapper/,clang-offload-bundler/,clang-offload-packager/,clang-refactor/,clang-rename/,clang-repl/,clang-scan-deps/,clang-shlib/,c-arcmt-test/,c-index-test/,diagtool/,driver/,libclang/
libclang (clang/tools/libclang/) is the C-stable API used by language bindings (Python, Ruby, ...).
Static analyzer
The static analyzer (clang/lib/StaticAnalyzer/) is a path-sensitive symbolic execution engine for C/C++/Objective-C. It implements custom checkers (e.g., null dereferences, use-after-free, malloc/free pairing, taint analysis). Documentation in clang/docs/analyzer/.
Integration points
- Emits LLVM IR consumed by
llvmfor optimization and codegen. - Reads pre-compiled headers and modules via
clang/lib/Serialization/. - Powers most of
clang-tools-extra(clangd, clang-tidy, clang-doc, etc.) via theclang::toolinglibrary. - The
clang::Formatlibrary is the engine behindclang-format.
Entry points for modification
- Adding a diagnostic: edit
clang/include/clang/Basic/Diagnostic*.tdfor the appropriate component, and emit it from Sema or wherever it triggers. - Adding an attribute: extend
clang/include/clang/Basic/Attr.tdand add the handler underclang/lib/Sema/SemaDeclAttr.cppor a Sema sibling file. CodeGen consumption follows inclang/lib/CodeGen/. - Adding a builtin: extend the relevant
Builtins*.tdand implement codegen inclang/lib/CodeGen/CGBuiltin.cppor its target-specific siblings. - Adding a checker: implement a
Checker<>subclass underclang/lib/StaticAnalyzer/Checkers/. - Adding a driver flag: extend
clang/include/clang/Driver/Options.tdand route it throughclang::driver::Driver.
Reference
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
llvm
Next
clang-tools-extra