apple/swift
String processing and Regex
Active contributors: hamishknight, milseman, natecook1000
Purpose
Three Swift modules implement the regex literals (/pattern/), the RegexBuilder DSL, and the runtime match engine that powers them:
_StringProcessing-- the internal match engine (NFA / bytecode interpreter). Public-but-underscored.RegexBuilder-- the result-builder DSL (e.g.,Regex { OneOrMore(.digit) }).RegexParser-- the parser for textual regex literals, used by both compile-time and runtime paths.
Together these implement SE-0350 and follow-on proposals.
Directory layout
stdlib/public/StringProcessing/ # match engine (Swift)
├── Regex.swift # the public Regex<Output> type
├── Match.swift, Engine.swift, ...
└── ...
stdlib/public/RegexBuilder/ # DSL surface
├── Builder.swift
├── Anchor.swift
├── Lookaround.swift, Repetition.swift, Alternation.swift, ...
└── ...
stdlib/public/RegexParser/ # textual parser
├── Parse.swift
├── Diagnostic.swift
├── PCRE.swift # PCRE-flavor extensions
└── ...The actual upstream of these modules is the swift-experimental-string-processing repo; the in-tree copies are vendored into the toolchain build.
Key abstractions
| Type | Module | Description |
|---|---|---|
Regex<Output> |
_StringProcessing |
A compiled regex with a typed output (matched substring + captures). |
Regex.Match |
_StringProcessing |
The result of a successful match. |
RegexComponent |
RegexBuilder |
Protocol for any DSL-buildable regex piece. |
OneOrMore, ZeroOrMore, Capture, ChoiceOf, Optionally, Anchor, CharacterClass |
RegexBuilder |
DSL elements. |
AST (in RegexParser) |
RegexParser |
The parsed regex syntax tree. |
MEProgram |
_StringProcessing |
The bytecode program produced from the AST. |
How it works
graph LR
Literal["/pattern/ literal"] --> Parser[RegexParser]
DSL["Regex { ... } DSL"] --> Builder[RegexBuilder]
Parser --> RegexAST
Builder --> RegexAST
RegexAST --> Compiler[engine compiler]
Compiler --> Bytecode[MEProgram]
Input["String input"] --> Engine[engine interpreter]
Bytecode --> Engine
Engine --> Match["Regex.Match"]Regex literal flow
When the user writes let r = /^(\d+)/, the lexer (in lib/Parse/ParseRegex.cpp) recognizes the /.../ literal at parse time. The frontend invokes the in-tree RegexParser to validate and produce an AST. Output type inference (which captures the regex has) feeds back into Sema; the resulting Regex<Output> type is fully typed at compile time.
RegexBuilder DSL
Regex { ... } uses a result builder (@RegexComponentBuilder) to compose RegexComponents. Each combinator emits an AST fragment.
Engine
The engine compiles the AST to a bytecode program (MEProgram) with explicit instructions: match, quantify, assert, save, restore, capture, branch. The interpreter walks the program against an input Substring, with backtracking for non-greedy quantifiers and ChoiceOf. Performance work is ongoing; some hot patterns lower to specialized fast paths.
Diagnostics
Regex parse errors are first-class compiler diagnostics. The parser emits structured errors (RegexParser.Diagnostic); the compiler frontend translates them into Swift diagnostics with SourceLoc ranges that point inside the regex literal.
Integration points
- Lexer / parser --
lib/Parse/ParseRegex.cpprecognizes literals and feeds them toRegexParser. - Sema -- types regex literals as
Regex<Output>with the inferred capture tuple. - Standard library -- adds matching APIs to
String(firstMatch(of:),matches(of:),replacing(_:with:)), implemented as extensions in_StringProcessing. StringProcessing-- depends onSwiftandRegexParser; generates bytecode and runs it.
Entry points for modification
- A new DSL element: add to
RegexBuilder/, extend the AST, teach the engine to compile/run it. - Bug in match semantics:
_StringProcessing/Engine.swiftis the interpreter. Tests undertest/stdlib/Regex*andtest/stdlib/StringProcessing*. - Upstream first: changes typically land in
swift-experimental-string-processingand are vendored into this repo.
Related pages
- Standard library --
Stringand the matching extensions live here. - Parser -- where regex literals are recognized.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.