AUTOMATIC1111/stable-diffusion-webui
Hires fix and upscaling workflow
Active contributors: AUTOMATIC1111, Kohaku-Blueleaf, w-e-w
Purpose
Hires fix is a two-pass txt2img workflow: generate at a smaller "native" resolution, then upscale and run a second sampling pass to add detail. It is the primary trick for getting good 2K+ images out of a 512×512-trained model.
The Extras tab and /sdapi/v1/extra-*-image provide the standalone upscale workflow — pure post-processing on existing images. They share the upscaler implementations but otherwise have no overlap.
Two passes
graph LR
Prompt --> Pass1[Sampler at native res<br/>e.g. 832×512]
Pass1 -->|latent or pixel upscale| Up[Upscale to target res<br/>e.g. 1664×1024]
Up --> Encode[(if pixel upscaler) VAE encode]
Encode --> Pass2[Sampler at target res<br/>denoising_strength steps]
Pass2 --> Decode[VAE decode]
Decode --> ImageThe implementation is in StableDiffusionProcessingTxt2Img.sample() (modules/processing.py). Latent upscale skips the VAE round-trip; pixel upscale (ESRGAN, SwinIR, etc.) goes through it.
Hires-fix parameters
| Field | Meaning |
|---|---|
enable_hr |
Master toggle. |
hr_scale |
Multiplier on width/height for the second pass (default 2.0). |
hr_resize_x, hr_resize_y |
Absolute target dimensions; if non-zero, override hr_scale. |
hr_upscaler |
Which upscaler to use between passes. The "Latent" entry uses bicubic on latents; everything else uses a model upscaler with VAE round-trip. |
hr_second_pass_steps |
Steps for the second sampling pass; default 0 means "use main steps". |
denoising_strength |
How much the second pass changes the image. Lower = more faithful to first pass; higher = more detail / more drift. 0.4–0.6 is the typical range. |
hr_sampler_name, hr_scheduler |
Override the sampler/scheduler for pass two; default "Use same sampler/scheduler". |
hr_prompt, hr_negative_prompt |
Optional override prompts for pass two. |
hr_cfg, hr_distilled_cfg |
CFG scale override for pass two (defaults to the main cfg_scale). |
In the API, these are top-level fields on StableDiffusionTxt2ImgProcessingAPI.
Hooks
Two callbacks are specific to hires fix:
before_hr—Script.before_hr(p, *args)is called between the two passes. Used by extensions that want to alterpbefore the second sample (changing the prompt, switching ControlNet weights).process_before_every_sampling— added in v1.10. Called each time a sampler is created, including pass two. The refiner uses this to swap models.
Latent vs pixel upscalers
| Upscaler entry | Behaviour |
|---|---|
Latent |
Bicubic interpolate latents. Cheapest but blurry. Increase denoising_strength to compensate. |
Latent (antialiased), Latent (bicubic), Latent (bicubic antialiased), Latent (nearest), Latent (nearest-exact) |
Variants on the latent interpolation. |
ESRGAN, RealESRGAN, SwinIR, ScuNET, LDSR, DAT, HAT |
Pixel-domain. Decode latents → upscale image → re-encode. Sharper but ~30–50% slower per pass. |
4x-UltraSharp, etc. |
These are individual ESRGAN model files; the entries are populated from models/ESRGAN/. |
The list is generated from shared.latent_upscale_modes + [x.name for x in shared.sd_upscalers]. Extensions can register more by adding to shared.sd_upscalers (see systems/postprocessing.md).
VAE round-trip cost
For pixel upscalers, the second pass does:
- VAE decode pass-one latents (~one second for SDXL).
- Upscaler forward pass (varies by upscaler; LDSR is the slowest at tens of seconds).
- VAE encode upscaled image.
- Re-noise using
denoising_strength. - Sampler at target resolution.
Step 5 is the dominant cost on most GPUs. --medvram will move the upscaler off-GPU between phases to keep VRAM available for the second sampler.
Standalone upscale (Extras tab)
The Extras tab is independent of hires fix. It loads an image (no diffusion model required), runs one or two upscalers in sequence, optionally chains face restoration and other postprocessors, and saves. See systems/postprocessing.md.
POST /sdapi/v1/extra-single-image and /extra-batch-images are the API equivalents. Their request shape is ExtrasSingleImageRequest / ExtrasBatchImagesRequest in modules/api/models.py.
Common pitfalls
- High denoising_strength + Latent upscale = blurry slop. Latent upscaling needs aggressive denoising to look sharp; pixel upscalers need less.
- Hires fix is incompatible with some samplers (LCM, DPM Adaptive). The UI just runs them anyway; behaviour is undefined.
hr_resize_x/hr_resize_ynot multiples of 8 were rejected in older versions; v1.4 relaxed this to multiples of 8 (instead of 64).- OOM on hires — switch from latent to a tile-based upscaler (ESRGAN/SwinIR), reduce
hr_scale, or pass--medvram.
Entry points for modification
- Add a hires-pass behaviour — register
before_hrandprocess_before_every_samplingcallbacks; these run at the right point. - Add a custom upscaler — see systems/postprocessing.md. The new upscaler will appear in the hires-fix dropdown automatically.
- Change the noise mix between passes — the relevant code is around
setup_img2img_stepsinprocessing.py; bear in mind every change here affects both regular img2img and hires-pass-two.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Image-to-image, inpainting, outpainting
Next
Training (Textual Inversion and Hypernetworks)