AUTOMATIC1111/stable-diffusion-webui

SD hijack

Active contributors: AUTOMATIC1111, brkirch, Kohaku-Blueleaf, Aarni Koskela

Purpose

Monkey-patches the upstream ldm (Latent Diffusion Model) and sgm (generative-models) packages at runtime so the webui's prompt syntax, attention optimisations, embeddings, and clip-skip work without forking those repositories. Every loaded model passes through model_hijack.hijack(sd_model) before it's used.

Directory layout

modules/
├── sd_hijack.py                       # the master hijack: applies/undoes patches; central StableDiffusionModelHijack
├── sd_hijack_clip.py                  # FrozenCLIPEmbedderWithCustomWords + attention parsing
├── sd_hijack_clip_old.py              # legacy emphasis behaviour for compatibility
├── sd_hijack_open_clip.py             # OpenCLIP variant (used by SDXL/SD2)
├── sd_hijack_xlmr.py                  # XLM-R encoder (Alt-Diffusion m9)
├── sd_hijack_unet.py                  # UNet upcasting and dtype shenanigans
├── sd_hijack_optimizations.py         # cross-attention optimisation dispatcher
├── sd_hijack_ip2p.py                  # InstructPix2Pix-specific patches
├── sd_hijack_utils.py                 # CondFunc — conditional patcher utility
├── sd_hijack_checkpoint.py            # nn.Module checkpoint hijack for memory savings
├── sd_emphasis.py                     # emphasis-as-multiplier helpers
├── sd_disable_initialization.py       # skips default-weight init when loading
└── xlmr.py / xlmr_m18.py              # XLM-R embedding pieces for Alt-Diffusion

Key abstractions

Type	File	Description
`StableDiffusionModelHijack`	`modules/sd_hijack.py`	Holds embedding database, clip wrappers, optimization state. Public methods `hijack()` / `undo_hijack()` / `apply_optimizations()`.
`model_hijack`	same	Module-level singleton instance.
`FrozenCLIPEmbedderWithCustomWords`	`modules/sd_hijack_clip.py`	Replaces ldm's CLIP wrapper; parses `(emphasis)`, `[neg]`, `(word:1.2)`, BREAK and chunked prompts.
`EmbeddingDatabase`	`modules/textual_inversion/textual_inversion.py`	Owned by `model_hijack`; maps trained embedding names to vectors.
`SdOptimization` (subclasses)	`modules/sd_hijack_optimizations.py`	Per-strategy class: `xformers`, `sdp`, `sdp_no_mem`, `sub_quad`, `v1`, `InvokeAI`, `doggettx`, `none`.
`CondFunc`	`modules/sd_hijack_utils.py`	Helper to swap a function for one that calls a different impl when a runtime predicate is true.

What the hijack does

When a checkpoint is loaded, model_hijack.hijack(sd_model) runs in sd_models.load_model_weights():

Replace CLIP / OpenCLIP — swaps the model's text-encoder wrapper with one of the *WithCustomWords classes. These understand the prompt syntax and integrate the embedding database.
Apply attention optimization — picks the best SdOptimization based on flags (--xformers, --opt-sdp-attention, …) and patches the relevant forward methods.
UNet dtype patches — for fp16 + xformers + certain GPUs, modules/sd_hijack_unet.py upcasts a few problem layers to fp32.
Embedding database refresh — scans embeddings/ and models/embeddings/ and registers each .pt/.safetensors file. Their tokens become recognisable in prompts.
Disable default init — wraps nn.Linear/nn.Conv2d constructors to skip Kaiming init since the weights will be overwritten anyway. This is what --disable-model-loading-ram-optimization turns off.

model_hijack.undo_hijack(sd_model) reverses all of the above when the model is unloaded or replaced.

CLIP wrapper details

graph LR
    Prompt --> Parse[prompt_parser:<br/>(word:1.2), [a:b:0.5], …]
    Parse --> Tokens[token list + per-token weights]
    Tokens --> Chunks[75-token chunks with BOS/EOS]
    Chunks --> EmbDB[EmbeddingDatabase<br/>substitute trained tokens]
    EmbDB --> CLIP[CLIP encode_with_transformer]
    CLIP --> Multiply[emphasis multiplier per token]
    Multiply --> Mean[adjust mean to keep distribution]
    Mean --> Out[conditioning tensor]

Two pieces are non-obvious:

Chunking: prompts longer than 75 tokens are split into chunks; each chunk gets BOS/EOS tokens. The encoder runs once per chunk; the results are concatenated. The BREAK keyword forces a chunk boundary.
Emphasis: (word:1.2) produces a per-token weight. After encoding, that weight is multiplied into the embedding. To avoid drifting the distribution, the mean is then re-aligned. The exact algorithm has switched once (sd_hijack_clip_old.py is the v1.0-era behaviour, kept for "Use old emphasis implementation" compatibility — see the option in modules/shared_options.py).
Clip skip: opts.CLIP_stop_at_last_layers > 1 returns the n-th-from-last hidden state instead of the final one. Useful for anime models trained on the second-to-last layer.

Attention optimisations

modules/sd_hijack_optimizations.py is the dispatcher. Each strategy is an SdOptimization subclass:

Class	Triggered by	Notes
`SdOptimizationXformers`	`--xformers` and `xformers` package available	Memory-efficient attention via FAIR's xFormers
`SdOptimizationSdpNoMem`	`--opt-sdp-no-mem-attention`	PyTorch 2.x SDPA without memory-efficient flag (deterministic)
`SdOptimizationSdp`	`--opt-sdp-attention`	PyTorch 2.x SDPA
`SdOptimizationSubQuad`	`--opt-sub-quad-attention`	Birch-san / AminRezaei0x443's chunked attention
`SdOptimizationV1`	`--opt-split-attention-v1`	Older split attention path
`SdOptimizationInvokeAI`	`--opt-split-attention-invokeai`	InvokeAI/lstein's split-attention
`SdOptimizationDoggettx`	`--opt-split-attention`	The original Doggettx implementation

If no flag is passed, SdOptimization.is_available() is consulted in priority order; xFormers > SDP > Doggettx is the typical fallback. Extensions can add more by registering an on_list_optimizers callback.

sd_hijack_optimizations.py is the single largest "math" file in the repo (~677 lines) — most of it is the attention math implementations themselves.

Where embeddings live

The EmbeddingDatabase is on model_hijack because it depends on the active text encoder's embedding shape. Loading flow:

model_hijack.embedding_db.add_embedding_dir(path) registers a directory.
After hijack(), model_hijack.embedding_db.load_textual_inversion_embeddings() scans those dirs.
Each embedding becomes a Embedding object with vec (the actual tensor), name (the token), step, sd_checkpoint, etc.
The CLIP wrapper's tokenizer recognises the name and substitutes the vector at encode time.

/sdapi/v1/embeddings, /sdapi/v1/refresh-embeddings, and the Train tab are the user-facing entry points.

Integration points

script_callbacks.on_model_loaded — fires after hijack() is done, so callbacks see a fully patched model.
script_callbacks.on_list_optimizers — register custom attention optimisers.
script_callbacks.on_list_unets — return alternative UNet implementations the user can pick. Tied to modules/sd_unet.py.
The --disable-opt-split-attention flag forces the SdOptimizationNone path (no optimisation) — useful for debugging numeric issues.

Entry points for modification

Add a new attention optimisation — subclass SdOptimization, implement apply() and undo(), and register it via on_list_optimizers.
Add a new prompt syntax — extend the Lark grammar in modules/prompt_parser.py and update FrozenCLIPEmbedderWithCustomWords.process_text().
Hijack a different upstream class — add a function inside StableDiffusionModelHijack.hijack() and a matching reversal in undo_hijack(). Always store originals so the reversal is correct.
Avoid hijacking when possible — script callbacks (cfg_denoiser, extra_noise, …) cover most "I want to insert behaviour at X step" cases without needing to monkey-patch ldm.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.