AUTOMATIC1111/stable-diffusion-webui

Processing pipeline

Active contributors: AUTOMATIC1111, w-e-w, catboxanon, Kohaku-Blueleaf, light-and-ray

Purpose

The processing pipeline is the core of the application: it takes a fully-populated StableDiffusionProcessing instance, runs the diffusion sampler, applies any post-processing, saves the resulting images, and returns a Processed object. Both the UI handlers (txt2img.py, img2img.py) and the API endpoints (/sdapi/v1/txt2img, /sdapi/v1/img2img) funnel into the same code in modules/processing.py.

Directory layout

modules/
├── processing.py                   # ~1,800 lines: pipeline + dataclasses
├── processing_scripts/             # alwayson built-ins
│   ├── seed.py                     # the seed/subseed UI and infotext registration
│   ├── sampler.py                  # the sampler/scheduler UI
│   ├── refiner.py                  # SDXL refiner mid-generation switch
│   └── comments.py                 # extracts # comments from prompts
├── prompt_parser.py                # Lark grammar for attention/scheduling/AND
├── sd_samplers*.py                 # individual sampler implementations
├── rng.py / rng_philox.py          # CPU + Philox seedable noise
├── lowvram.py                      # weight-shuffling for low-VRAM mode
├── extra_networks.py               # parses <lora:foo:1.0> tokens out of the prompt
└── images.py                       # save_image, grid building, infotext writing

Key abstractions

Type	File	Description
`StableDiffusionProcessing`	`modules/processing.py`	Dataclass with every parameter for a generation job. ~50 fields.
`StableDiffusionProcessingTxt2Img`	same	Adds hires-fix fields (`enable_hr`, `hr_scale`, `hr_upscaler`, …).
`StableDiffusionProcessingImg2Img`	same	Adds img2img fields (`init_images`, `mask`, `denoising_strength`, `inpaint_full_res`, …).
`Processed`	same	Result; carries `images`, `infotexts`, `all_seeds`, `all_subseeds`, comments.
`process_images(p)`	same	Public entry point; wraps `process_images_inner(p)` with model swap and lock.
`process_images_inner(p)`	same	The actual loop; ~290 lines covering script hooks, sampling, post-processing, saving.
`create_infotext(...)`	same	Builds the metadata string saved into PNG/EXIF.
`decode_latent_batch(...)`	same	VAE decode with NaN handling and lowvram-aware batching.
`apply_overlay(...)`, `apply_color_correction(...)`	same	Post-sample hooks for img2img masking and tonal matching.

How it works

graph TD
    Caller[txt2img.py / api.py] -->|p| PI[process_images]
    PI -->|swap model if needed| SW[sd_models.reload_model_weights]
    PI --> PII[process_images_inner]

    PII -->|fix_seed| Seed
    PII -->|extra_networks.parse_prompts| EN[extra_networks.activate]
    EN --> Loras[Lora / hypernet / TI patches]
    PII -->|setup_conds| Cond[prompt_parser + CLIP]
    PII -->|p.scripts.process| Scripts1[alwayson scripts: process]

    PII --> Loop{for batch in batches}
    Loop -->|create noise| RNG
    Loop -->|p.sample| Sampler
    Sampler --> CFGDenoiser
    CFGDenoiser --> UNet
    Loop -->|p.scripts.process_batch / before_hr| Scripts2[scripts: per-batch hooks]
    Loop -->|decode_first_stage| VAE
    Loop -->|p.scripts.postprocess_image| Scripts3[scripts: postprocess_image]
    Loop -->|face restoration / color correction / overlay| Post
    Loop -->|images.save_image| Save
    Loop --> Loop

    PII -->|p.scripts.postprocess| Scripts4[scripts: postprocess]
    PII -->|extra_networks.deactivate| END
    PII --> R[Processed]
    R --> Caller

The full loop, with line references in modules/processing.py:

Setup — process_images() (line ~819): saves the current sd_model_checkpoint and sd_vae settings, and ensures the right model is loaded if the request specifies an override.
Inner pipeline — process_images_inner() (line ~863):
1. Set seed (fix_seed), build comments dict, set state.job_count.
2. Parse extra-network tokens out of the prompt with extra_networks.parse_prompts(). This rewrites prompt to remove <lora:foo:1.0> and produces an activation list passed to extra_networks.activate(p, ...) later.
3. Encode the prompt and negative prompt into conditioning with setup_conds(). This handles attention syntax, prompt scheduling ([a:b:0.5]), and AND-composable diffusion via modules/prompt_parser.py.
4. Call p.scripts.process(p) — alwayson scripts get a chance to mutate p before sampling.
5. For each batch (for n in range(p.n_iter)):
  - Generate noise via modules/rng.py.
  - Call p.sample() (subclass-specific). This in turn calls create_sampler() from modules/sd_samplers.py and runs the chosen sampler.
  - Hires fix (Txt2Img only): if enable_hr, upscale latents (or decode/re-encode for tile-based upscalers), then run a second pass with the hires sampler.
  - Refiner (SDXL): if a refiner model is set, switch checkpoints at the configured step. See modules/processing_scripts/refiner.py.
  - VAE decode → clamp → convert to PIL.
  - Run face restoration if enabled.
  - Apply color correction (img2img with apply_color_correction).
  - Apply mask overlay for inpainting (apply_overlay).
  - p.scripts.postprocess_image(p, image) — alwayson scripts can edit the final image.
  - images.save_image(image, ...) — write the file with infotext metadata.
6. Build Processed with infotexts (one per image) and return.

Hires fix

The hires-fix path is internal to StableDiffusionProcessingTxt2Img.sample(). It runs the regular sampler at low res, then either:

Latent upscale — bicubic upscale the latent and run the sampler again at the new size (cheap, can blur).
Pixel upscale — VAE-decode, run the chosen upscaler (ESRGAN, SwinIR, …), VAE-encode, and run the sampler again (slower, sharper).

Hires hooks: before_hr (alwayson scripts) is called between the two passes.

Refiner

The SDXL refiner is implemented as an "alwayson script" in modules/processing_scripts/refiner.py. It registers before_sampling and process_before_every_sampling hooks; when the configured refiner_switch_at is reached, the script swaps shared.sd_model to the refiner checkpoint and lets the rest of the loop continue. Memory-wise this can be expensive — --medvram-sdxl exists specifically for this case.

Prompt parsing and AND-composable diffusion

The Lark grammar in modules/prompt_parser.py handles three syntaxes:

Attention: (word) → 1.1×, [word] → 1/1.1×, (word:1.5) → explicit weight. Implemented as token-weight pairs that the CLIP hijack uses to scale embeddings.
Prompt scheduling: [from:to:0.5] switches from→to halfway through sampling. Implemented as a list of (end_step, prompt) pairs; setup_conds re-encodes when steps cross a boundary.
AND-composable diffusion: prompt1 AND prompt2 :1.2 runs the model on both prompts and blends the predicted noise. Implemented in sd_samplers_cfg_denoiser.py.

These are independent and can be combined. Negative prompts share the same syntax.

Integration points

Scripts hook nearly every step via the ScriptRunner — see scripts-and-extensions.md for the full list.
cfg_denoiser and cfg_denoised callbacks let extensions modify the noise prediction at every step — see script-callbacks.md.
extra_noise callback lets extensions inject custom noise before sampling.
process_before_every_sampling (added v1.10) lets refiner-style switches run on each sub-call (hires fix counts as a separate sampling).
image_saved and before_image_saved callbacks let extensions write sidecar files or change the PNG-info dict.

Entry points for modification

A new generation parameter — add a field to StableDiffusionProcessing, append a (component, key) to the relevant tab's paste_fields, and update create_infotext() to include it. The API will pick it up automatically because modules/api/models.py is generated from StableDiffusionProcessing introspection.
A new sampler — see samplers-and-schedulers.md.
A new pipeline stage (e.g., a new mid-sample transformation) — register a script callback (cfg_denoiser, process_before_every_sampling, etc.) rather than editing process_images_inner. Internal use of these hooks is established (the refiner already does this).
Image saving — extend images.save_image() (modules/images.py). Filename patterns are documented in images.FilenameGenerator.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.