AUTOMATIC1111/stable-diffusion-webui
Processing pipeline
Active contributors: AUTOMATIC1111, w-e-w, catboxanon, Kohaku-Blueleaf, light-and-ray
Purpose
The processing pipeline is the core of the application: it takes a fully-populated StableDiffusionProcessing instance, runs the diffusion sampler, applies any post-processing, saves the resulting images, and returns a Processed object. Both the UI handlers (txt2img.py, img2img.py) and the API endpoints (/sdapi/v1/txt2img, /sdapi/v1/img2img) funnel into the same code in modules/processing.py.
Directory layout
modules/
├── processing.py # ~1,800 lines: pipeline + dataclasses
├── processing_scripts/ # alwayson built-ins
│ ├── seed.py # the seed/subseed UI and infotext registration
│ ├── sampler.py # the sampler/scheduler UI
│ ├── refiner.py # SDXL refiner mid-generation switch
│ └── comments.py # extracts # comments from prompts
├── prompt_parser.py # Lark grammar for attention/scheduling/AND
├── sd_samplers*.py # individual sampler implementations
├── rng.py / rng_philox.py # CPU + Philox seedable noise
├── lowvram.py # weight-shuffling for low-VRAM mode
├── extra_networks.py # parses <lora:foo:1.0> tokens out of the prompt
└── images.py # save_image, grid building, infotext writingKey abstractions
| Type | File | Description |
|---|---|---|
StableDiffusionProcessing |
modules/processing.py |
Dataclass with every parameter for a generation job. ~50 fields. |
StableDiffusionProcessingTxt2Img |
same | Adds hires-fix fields (enable_hr, hr_scale, hr_upscaler, …). |
StableDiffusionProcessingImg2Img |
same | Adds img2img fields (init_images, mask, denoising_strength, inpaint_full_res, …). |
Processed |
same | Result; carries images, infotexts, all_seeds, all_subseeds, comments. |
process_images(p) |
same | Public entry point; wraps process_images_inner(p) with model swap and lock. |
process_images_inner(p) |
same | The actual loop; ~290 lines covering script hooks, sampling, post-processing, saving. |
create_infotext(...) |
same | Builds the metadata string saved into PNG/EXIF. |
decode_latent_batch(...) |
same | VAE decode with NaN handling and lowvram-aware batching. |
apply_overlay(...), apply_color_correction(...) |
same | Post-sample hooks for img2img masking and tonal matching. |
How it works
graph TD
Caller[txt2img.py / api.py] -->|p| PI[process_images]
PI -->|swap model if needed| SW[sd_models.reload_model_weights]
PI --> PII[process_images_inner]
PII -->|fix_seed| Seed
PII -->|extra_networks.parse_prompts| EN[extra_networks.activate]
EN --> Loras[Lora / hypernet / TI patches]
PII -->|setup_conds| Cond[prompt_parser + CLIP]
PII -->|p.scripts.process| Scripts1[alwayson scripts: process]
PII --> Loop{for batch in batches}
Loop -->|create noise| RNG
Loop -->|p.sample| Sampler
Sampler --> CFGDenoiser
CFGDenoiser --> UNet
Loop -->|p.scripts.process_batch / before_hr| Scripts2[scripts: per-batch hooks]
Loop -->|decode_first_stage| VAE
Loop -->|p.scripts.postprocess_image| Scripts3[scripts: postprocess_image]
Loop -->|face restoration / color correction / overlay| Post
Loop -->|images.save_image| Save
Loop --> Loop
PII -->|p.scripts.postprocess| Scripts4[scripts: postprocess]
PII -->|extra_networks.deactivate| END
PII --> R[Processed]
R --> CallerThe full loop, with line references in modules/processing.py:
- Setup —
process_images()(line ~819): saves the currentsd_model_checkpointandsd_vaesettings, and ensures the right model is loaded if the request specifies an override. - Inner pipeline —
process_images_inner()(line ~863):- Set seed (
fix_seed), build comments dict, setstate.job_count. - Parse extra-network tokens out of the prompt with
extra_networks.parse_prompts(). This rewritespromptto remove<lora:foo:1.0>and produces an activation list passed toextra_networks.activate(p, ...)later. - Encode the prompt and negative prompt into conditioning with
setup_conds(). This handles attention syntax, prompt scheduling ([a:b:0.5]), and AND-composable diffusion viamodules/prompt_parser.py. - Call
p.scripts.process(p)— alwayson scripts get a chance to mutatepbefore sampling. - For each batch (
for n in range(p.n_iter)):- Generate noise via
modules/rng.py. - Call
p.sample()(subclass-specific). This in turn callscreate_sampler()frommodules/sd_samplers.pyand runs the chosen sampler. - Hires fix (Txt2Img only): if
enable_hr, upscale latents (or decode/re-encode for tile-based upscalers), then run a second pass with the hires sampler. - Refiner (SDXL): if a refiner model is set, switch checkpoints at the configured step. See
modules/processing_scripts/refiner.py. - VAE decode → clamp → convert to PIL.
- Run face restoration if enabled.
- Apply color correction (img2img with
apply_color_correction). - Apply mask overlay for inpainting (
apply_overlay). p.scripts.postprocess_image(p, image)— alwayson scripts can edit the final image.images.save_image(image, ...)— write the file with infotext metadata.
- Generate noise via
- Build
Processedwithinfotexts(one per image) and return.
- Set seed (
Hires fix
The hires-fix path is internal to StableDiffusionProcessingTxt2Img.sample(). It runs the regular sampler at low res, then either:
- Latent upscale — bicubic upscale the latent and run the sampler again at the new size (cheap, can blur).
- Pixel upscale — VAE-decode, run the chosen upscaler (ESRGAN, SwinIR, …), VAE-encode, and run the sampler again (slower, sharper).
Hires hooks: before_hr (alwayson scripts) is called between the two passes.
Refiner
The SDXL refiner is implemented as an "alwayson script" in modules/processing_scripts/refiner.py. It registers before_sampling and process_before_every_sampling hooks; when the configured refiner_switch_at is reached, the script swaps shared.sd_model to the refiner checkpoint and lets the rest of the loop continue. Memory-wise this can be expensive — --medvram-sdxl exists specifically for this case.
Prompt parsing and AND-composable diffusion
The Lark grammar in modules/prompt_parser.py handles three syntaxes:
- Attention:
(word)→ 1.1×,[word]→ 1/1.1×,(word:1.5)→ explicit weight. Implemented as token-weight pairs that the CLIP hijack uses to scale embeddings. - Prompt scheduling:
[from:to:0.5]switchesfrom→tohalfway through sampling. Implemented as a list of(end_step, prompt)pairs;setup_condsre-encodes when steps cross a boundary. - AND-composable diffusion:
prompt1 AND prompt2 :1.2runs the model on both prompts and blends the predicted noise. Implemented insd_samplers_cfg_denoiser.py.
These are independent and can be combined. Negative prompts share the same syntax.
Integration points
- Scripts hook nearly every step via the
ScriptRunner— see scripts-and-extensions.md for the full list. cfg_denoiserandcfg_denoisedcallbacks let extensions modify the noise prediction at every step — see script-callbacks.md.extra_noisecallback lets extensions inject custom noise before sampling.process_before_every_sampling(added v1.10) lets refiner-style switches run on each sub-call (hires fix counts as a separate sampling).image_savedandbefore_image_savedcallbacks let extensions write sidecar files or change the PNG-info dict.
Entry points for modification
- A new generation parameter — add a field to
StableDiffusionProcessing, append a(component, key)to the relevant tab'spaste_fields, and updatecreate_infotext()to include it. The API will pick it up automatically becausemodules/api/models.pyis generated fromStableDiffusionProcessingintrospection. - A new sampler — see samplers-and-schedulers.md.
- A new pipeline stage (e.g., a new mid-sample transformation) — register a script callback (
cfg_denoiser,process_before_every_sampling, etc.) rather than editingprocess_images_inner. Internal use of these hooks is established (the refiner already does this). - Image saving — extend
images.save_image()(modules/images.py). Filename patterns are documented inimages.FilenameGenerator.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
UI
Next
Models