AUTOMATIC1111/stable-diffusion-webui
Image-to-image, inpainting, outpainting
Active contributors: AUTOMATIC1111, w-e-w, light-and-ray, catboxanon
Purpose
img2img takes an existing image plus a prompt and produces a new image that respects both. It also serves as the entry point for inpainting (mask + replacement region), outpainting (extending the canvas), and a handful of related variations exposed through the Script dropdown.
Code layout
| Concern | File |
|---|---|
| Generate button handler | modules/img2img.py |
| Tab UI | the with gr.Tab("img2img") block in modules/ui.py |
| Pipeline class | StableDiffusionProcessingImg2Img in modules/processing.py |
| Mask construction | create_binary_mask, apply_overlay in modules/processing.py; soft inpainting in extensions-builtin/soft-inpainting/ |
| Color matching | apply_color_correction (same file) |
| API route | img2imgapi in modules/api/api.py |
| Outpainting scripts | scripts/outpainting_mk_2.py, scripts/poor_mans_outpainting.py |
| Loopback | scripts/loopback.py |
| img2img alternative | scripts/img2imgalt.py |
| SD upscale | scripts/sd_upscale.py |
How img2img differs from txt2img
StableDiffusionProcessingImg2Img (line ~1557 in processing.py) overrides init() and sample():
init()— encodes the input image to latents using the VAE. If a mask is provided, it builds binary and feathered masks, optionally crops to "only masked" region, and prepares an inpainting overlay.sample()— adds noise to the encoded latents according todenoising_strength, then runs the sampler. The lower the denoising strength, the closer the result stays to the input.
The other img2img-specific knobs:
init_images: list[PIL.Image]— one or more input images. Batch input puts each image through one generation.mask: PIL.Image | None— alpha mask used for inpainting.mask_blur— feather radius around the mask.inpainting_fill— what to put in the masked area before sampling: original, latent noise, latent nothing, fill with colour.inpaint_full_res/inpaint_full_res_padding— "Only masked" mode crops around the mask, generates at full resolution, then composites back.inpainting_mask_invert— flip the mask.resize_mode— how to resize input image to target dimensions: just resize, crop and resize, resize and fill, latent upscale.
The five sub-tabs
The img2img tab has five input sources, all driven by Gradio components in ui.py:
| Sub-tab | What it accepts |
|---|---|
| img2img | A single image; no mask. |
| Sketch | A canvas where the user can draw colour as a hint. |
| Inpaint | Image + brush mask drawn in-tab. |
| Inpaint sketch | Image + colour mask drawn in-tab. |
| Inpaint upload | Image + separate alpha-mask file. |
| Batch | A directory of images plus optional mask directory. |
The handler img2img.img2img() switches on a mode parameter to pick the right combination of the above.
Inpainting
Two flavours of inpainting coexist:
- Standard inpainting — the model is asked to denoise the masked region only, with the unmasked region kept fixed. The mask is used to blend latents at every step (
mask,nmaskin the sampler args). - Inpainting model inpainting — when the loaded checkpoint is an "inpainting model" (e.g., Stability's
sd-v1-5-inpainting), it expects 9-channel input. The init step concatenates the masked image and the mask onto the latent. Detection logic is inprocessing.py— search foris_inpaint.
Soft inpainting is implemented as a builtin extension under extensions-builtin/soft-inpainting/. It hooks cfg_denoiser and blends predicted noise with the original noise at every step using a smooth mask falloff.
Outpainting
outpainting_mk_2.py is the more sophisticated outpainting script:
- Pads the input image with the chosen colour or noise.
- Builds a feathered mask covering the new pixels.
- Generates a Mirror-extended noise pattern in the new area (the "mk_2" trick from
parlance-zz/g-diffuser-bot). - Runs img2img inpainting with that as the starting latent.
poor_mans_outpainting.py is the older, simpler version: just inpainting with no clever noise.
Loopback
scripts/loopback.py runs img2img N times, feeding each output back as the next input. The denoising strength can be ramped up or down across iterations (the "Final denoising strength" + "Denoising strength curve" controls). Useful for slow style drift / animation frames.
SD upscale
scripts/sd_upscale.py tile-img2img-upscales a large image: split into overlapping tiles, run img2img on each with low denoising, reassemble. Sometimes confused with the upscaler in the Extras tab; this one is conceptually closer to "re-render at higher resolution".
Color correction
When opts.img2img_color_correction is on, setup_color_correction (in processing.py) computes a histogram match target from the input image and apply_color_correction is called after sampling so the output preserves the input's colour distribution. Useful for loopback to avoid colour drift.
Mask overlay and blending
After sampling, if a mask was used, apply_overlay() composites the freshly generated region into the original image using the mask. The mask_blur setting controls the feather; inpaint_full_res re-uses uncrop to put a "only masked" tile back into the original resolution.
Integration points
- The same
Scriptand callback system as txt2img — most extensions work in both tabs by checkingis_img2imginScript.show(). - ControlNet and similar tools generally hook here via
cfg_denoiserand don't need img2img-specific code. - Scripts that only make sense for img2img set
Script.is_txt2img = False. Inverse viais_img2img = False.
Entry points for modification
- Custom mask logic — edit
init()ofStableDiffusionProcessingImg2Img, but consider an alwayson script instead. - Per-image preprocessing — the
before_processhook of an alwayson script can mutatep.init_imagesbefore the encoder runs. - A new img2img mode — add a new sub-tab in
ui.pyand a newmodevalue handled byimg2img.img2img(). - Mask compositing —
apply_overlayis the choke point. Soft inpainting overrides this through thecfg_denoiserpath; copy that pattern.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Text-to-image (txt2img)
Next
Hires fix and upscaling workflow