Open-Source Wikis

/

Swift

/

Libraries

/

String processing and Regex

apple/swift

String processing and Regex

Active contributors: hamishknight, milseman, natecook1000

Purpose

Three Swift modules implement the regex literals (/pattern/), the RegexBuilder DSL, and the runtime match engine that powers them:

  • _StringProcessing -- the internal match engine (NFA / bytecode interpreter). Public-but-underscored.
  • RegexBuilder -- the result-builder DSL (e.g., Regex { OneOrMore(.digit) }).
  • RegexParser -- the parser for textual regex literals, used by both compile-time and runtime paths.

Together these implement SE-0350 and follow-on proposals.

Directory layout

stdlib/public/StringProcessing/         # match engine (Swift)
├── Regex.swift                          # the public Regex<Output> type
├── Match.swift, Engine.swift, ...
└── ...

stdlib/public/RegexBuilder/             # DSL surface
├── Builder.swift
├── Anchor.swift
├── Lookaround.swift, Repetition.swift, Alternation.swift, ...
└── ...

stdlib/public/RegexParser/              # textual parser
├── Parse.swift
├── Diagnostic.swift
├── PCRE.swift                           # PCRE-flavor extensions
└── ...

The actual upstream of these modules is the swift-experimental-string-processing repo; the in-tree copies are vendored into the toolchain build.

Key abstractions

Type Module Description
Regex<Output> _StringProcessing A compiled regex with a typed output (matched substring + captures).
Regex.Match _StringProcessing The result of a successful match.
RegexComponent RegexBuilder Protocol for any DSL-buildable regex piece.
OneOrMore, ZeroOrMore, Capture, ChoiceOf, Optionally, Anchor, CharacterClass RegexBuilder DSL elements.
AST (in RegexParser) RegexParser The parsed regex syntax tree.
MEProgram _StringProcessing The bytecode program produced from the AST.

How it works

graph LR
    Literal["/pattern/ literal"] --> Parser[RegexParser]
    DSL["Regex { ... } DSL"] --> Builder[RegexBuilder]
    Parser --> RegexAST
    Builder --> RegexAST
    RegexAST --> Compiler[engine compiler]
    Compiler --> Bytecode[MEProgram]
    Input["String input"] --> Engine[engine interpreter]
    Bytecode --> Engine
    Engine --> Match["Regex.Match"]

Regex literal flow

When the user writes let r = /^(\d+)/, the lexer (in lib/Parse/ParseRegex.cpp) recognizes the /.../ literal at parse time. The frontend invokes the in-tree RegexParser to validate and produce an AST. Output type inference (which captures the regex has) feeds back into Sema; the resulting Regex<Output> type is fully typed at compile time.

RegexBuilder DSL

Regex { ... } uses a result builder (@RegexComponentBuilder) to compose RegexComponents. Each combinator emits an AST fragment.

Engine

The engine compiles the AST to a bytecode program (MEProgram) with explicit instructions: match, quantify, assert, save, restore, capture, branch. The interpreter walks the program against an input Substring, with backtracking for non-greedy quantifiers and ChoiceOf. Performance work is ongoing; some hot patterns lower to specialized fast paths.

Diagnostics

Regex parse errors are first-class compiler diagnostics. The parser emits structured errors (RegexParser.Diagnostic); the compiler frontend translates them into Swift diagnostics with SourceLoc ranges that point inside the regex literal.

Integration points

  • Lexer / parser -- lib/Parse/ParseRegex.cpp recognizes literals and feeds them to RegexParser.
  • Sema -- types regex literals as Regex<Output> with the inferred capture tuple.
  • Standard library -- adds matching APIs to String (firstMatch(of:), matches(of:), replacing(_:with:)), implemented as extensions in _StringProcessing.
  • StringProcessing -- depends on Swift and RegexParser; generates bytecode and runs it.

Entry points for modification

  • A new DSL element: add to RegexBuilder/, extend the AST, teach the engine to compile/run it.
  • Bug in match semantics: _StringProcessing/Engine.swift is the interpreter. Tests under test/stdlib/Regex* and test/stdlib/StringProcessing*.
  • Upstream first: changes typically land in swift-experimental-string-processing and are vendored into this repo.
  • Standard library -- String and the matching extensions live here.
  • Parser -- where regex literals are recognized.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

String processing and Regex – Swift wiki | Factory