Factory.ai

Open-Source Wikis

/

Stable Diffusion WebUI

/

Systems

/

SD hijack

AUTOMATIC1111/stable-diffusion-webui

SD hijack

Active contributors: AUTOMATIC1111, brkirch, Kohaku-Blueleaf, Aarni Koskela

Purpose

Monkey-patches the upstream ldm (Latent Diffusion Model) and sgm (generative-models) packages at runtime so the webui's prompt syntax, attention optimisations, embeddings, and clip-skip work without forking those repositories. Every loaded model passes through model_hijack.hijack(sd_model) before it's used.

Directory layout

modules/
├── sd_hijack.py                       # the master hijack: applies/undoes patches; central StableDiffusionModelHijack
├── sd_hijack_clip.py                  # FrozenCLIPEmbedderWithCustomWords + attention parsing
├── sd_hijack_clip_old.py              # legacy emphasis behaviour for compatibility
├── sd_hijack_open_clip.py             # OpenCLIP variant (used by SDXL/SD2)
├── sd_hijack_xlmr.py                  # XLM-R encoder (Alt-Diffusion m9)
├── sd_hijack_unet.py                  # UNet upcasting and dtype shenanigans
├── sd_hijack_optimizations.py         # cross-attention optimisation dispatcher
├── sd_hijack_ip2p.py                  # InstructPix2Pix-specific patches
├── sd_hijack_utils.py                 # CondFunc — conditional patcher utility
├── sd_hijack_checkpoint.py            # nn.Module checkpoint hijack for memory savings
├── sd_emphasis.py                     # emphasis-as-multiplier helpers
├── sd_disable_initialization.py       # skips default-weight init when loading
└── xlmr.py / xlmr_m18.py              # XLM-R embedding pieces for Alt-Diffusion

Key abstractions

Type File Description
StableDiffusionModelHijack modules/sd_hijack.py Holds embedding database, clip wrappers, optimization state. Public methods hijack() / undo_hijack() / apply_optimizations().
model_hijack same Module-level singleton instance.
FrozenCLIPEmbedderWithCustomWords modules/sd_hijack_clip.py Replaces ldm's CLIP wrapper; parses (emphasis), [neg], (word:1.2), BREAK and chunked prompts.
EmbeddingDatabase modules/textual_inversion/textual_inversion.py Owned by model_hijack; maps trained embedding names to vectors.
SdOptimization (subclasses) modules/sd_hijack_optimizations.py Per-strategy class: xformers, sdp, sdp_no_mem, sub_quad, v1, InvokeAI, doggettx, none.
CondFunc modules/sd_hijack_utils.py Helper to swap a function for one that calls a different impl when a runtime predicate is true.

What the hijack does

When a checkpoint is loaded, model_hijack.hijack(sd_model) runs in sd_models.load_model_weights():

  1. Replace CLIP / OpenCLIP — swaps the model's text-encoder wrapper with one of the *WithCustomWords classes. These understand the prompt syntax and integrate the embedding database.
  2. Apply attention optimization — picks the best SdOptimization based on flags (--xformers, --opt-sdp-attention, …) and patches the relevant forward methods.
  3. UNet dtype patches — for fp16 + xformers + certain GPUs, modules/sd_hijack_unet.py upcasts a few problem layers to fp32.
  4. Embedding database refresh — scans embeddings/ and models/embeddings/ and registers each .pt/.safetensors file. Their tokens become recognisable in prompts.
  5. Disable default init — wraps nn.Linear/nn.Conv2d constructors to skip Kaiming init since the weights will be overwritten anyway. This is what --disable-model-loading-ram-optimization turns off.

model_hijack.undo_hijack(sd_model) reverses all of the above when the model is unloaded or replaced.

CLIP wrapper details

graph LR
    Prompt --> Parse[prompt_parser:<br/>(word:1.2), [a:b:0.5], …]
    Parse --> Tokens[token list + per-token weights]
    Tokens --> Chunks[75-token chunks with BOS/EOS]
    Chunks --> EmbDB[EmbeddingDatabase<br/>substitute trained tokens]
    EmbDB --> CLIP[CLIP encode_with_transformer]
    CLIP --> Multiply[emphasis multiplier per token]
    Multiply --> Mean[adjust mean to keep distribution]
    Mean --> Out[conditioning tensor]

Two pieces are non-obvious:

  • Chunking: prompts longer than 75 tokens are split into chunks; each chunk gets BOS/EOS tokens. The encoder runs once per chunk; the results are concatenated. The BREAK keyword forces a chunk boundary.
  • Emphasis: (word:1.2) produces a per-token weight. After encoding, that weight is multiplied into the embedding. To avoid drifting the distribution, the mean is then re-aligned. The exact algorithm has switched once (sd_hijack_clip_old.py is the v1.0-era behaviour, kept for "Use old emphasis implementation" compatibility — see the option in modules/shared_options.py).
  • Clip skip: opts.CLIP_stop_at_last_layers > 1 returns the n-th-from-last hidden state instead of the final one. Useful for anime models trained on the second-to-last layer.

Attention optimisations

modules/sd_hijack_optimizations.py is the dispatcher. Each strategy is an SdOptimization subclass:

Class Triggered by Notes
SdOptimizationXformers --xformers and xformers package available Memory-efficient attention via FAIR's xFormers
SdOptimizationSdpNoMem --opt-sdp-no-mem-attention PyTorch 2.x SDPA without memory-efficient flag (deterministic)
SdOptimizationSdp --opt-sdp-attention PyTorch 2.x SDPA
SdOptimizationSubQuad --opt-sub-quad-attention Birch-san / AminRezaei0x443's chunked attention
SdOptimizationV1 --opt-split-attention-v1 Older split attention path
SdOptimizationInvokeAI --opt-split-attention-invokeai InvokeAI/lstein's split-attention
SdOptimizationDoggettx --opt-split-attention The original Doggettx implementation

If no flag is passed, SdOptimization.is_available() is consulted in priority order; xFormers > SDP > Doggettx is the typical fallback. Extensions can add more by registering an on_list_optimizers callback.

sd_hijack_optimizations.py is the single largest "math" file in the repo (~677 lines) — most of it is the attention math implementations themselves.

Where embeddings live

The EmbeddingDatabase is on model_hijack because it depends on the active text encoder's embedding shape. Loading flow:

  1. model_hijack.embedding_db.add_embedding_dir(path) registers a directory.
  2. After hijack(), model_hijack.embedding_db.load_textual_inversion_embeddings() scans those dirs.
  3. Each embedding becomes a Embedding object with vec (the actual tensor), name (the token), step, sd_checkpoint, etc.
  4. The CLIP wrapper's tokenizer recognises the name and substitutes the vector at encode time.

/sdapi/v1/embeddings, /sdapi/v1/refresh-embeddings, and the Train tab are the user-facing entry points.

Integration points

  • script_callbacks.on_model_loaded — fires after hijack() is done, so callbacks see a fully patched model.
  • script_callbacks.on_list_optimizers — register custom attention optimisers.
  • script_callbacks.on_list_unets — return alternative UNet implementations the user can pick. Tied to modules/sd_unet.py.
  • The --disable-opt-split-attention flag forces the SdOptimizationNone path (no optimisation) — useful for debugging numeric issues.

Entry points for modification

  • Add a new attention optimisation — subclass SdOptimization, implement apply() and undo(), and register it via on_list_optimizers.
  • Add a new prompt syntax — extend the Lark grammar in modules/prompt_parser.py and update FrozenCLIPEmbedderWithCustomWords.process_text().
  • Hijack a different upstream class — add a function inside StableDiffusionModelHijack.hijack() and a matching reversal in undo_hijack(). Always store originals so the reversal is correct.
  • Avoid hijacking when possible — script callbacks (cfg_denoiser, extra_noise, …) cover most "I want to insert behaviour at X step" cases without needing to monkey-patch ldm.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

SD hijack – Stable Diffusion WebUI wiki | Factory