AUTOMATIC1111/stable-diffusion-webui

Image-to-image, inpainting, outpainting

Active contributors: AUTOMATIC1111, w-e-w, light-and-ray, catboxanon

Purpose

img2img takes an existing image plus a prompt and produces a new image that respects both. It also serves as the entry point for inpainting (mask + replacement region), outpainting (extending the canvas), and a handful of related variations exposed through the Script dropdown.

Code layout

Concern	File
Generate button handler	`modules/img2img.py`
Tab UI	the `with gr.Tab("img2img")` block in `modules/ui.py`
Pipeline class	`StableDiffusionProcessingImg2Img` in `modules/processing.py`
Mask construction	`create_binary_mask`, `apply_overlay` in `modules/processing.py`; soft inpainting in `extensions-builtin/soft-inpainting/`
Color matching	`apply_color_correction` (same file)
API route	`img2imgapi` in `modules/api/api.py`
Outpainting scripts	`scripts/outpainting_mk_2.py`, `scripts/poor_mans_outpainting.py`
Loopback	`scripts/loopback.py`
img2img alternative	`scripts/img2imgalt.py`
SD upscale	`scripts/sd_upscale.py`

How img2img differs from txt2img

StableDiffusionProcessingImg2Img (line ~1557 in processing.py) overrides init() and sample():

init() — encodes the input image to latents using the VAE. If a mask is provided, it builds binary and feathered masks, optionally crops to "only masked" region, and prepares an inpainting overlay.
sample() — adds noise to the encoded latents according to denoising_strength, then runs the sampler. The lower the denoising strength, the closer the result stays to the input.

The other img2img-specific knobs:

init_images: list[PIL.Image] — one or more input images. Batch input puts each image through one generation.
mask: PIL.Image | None — alpha mask used for inpainting.
mask_blur — feather radius around the mask.
inpainting_fill — what to put in the masked area before sampling: original, latent noise, latent nothing, fill with colour.
inpaint_full_res / inpaint_full_res_padding — "Only masked" mode crops around the mask, generates at full resolution, then composites back.
inpainting_mask_invert — flip the mask.
resize_mode — how to resize input image to target dimensions: just resize, crop and resize, resize and fill, latent upscale.

The five sub-tabs

The img2img tab has five input sources, all driven by Gradio components in ui.py:

Sub-tab	What it accepts
img2img	A single image; no mask.
Sketch	A canvas where the user can draw colour as a hint.
Inpaint	Image + brush mask drawn in-tab.
Inpaint sketch	Image + colour mask drawn in-tab.
Inpaint upload	Image + separate alpha-mask file.
Batch	A directory of images plus optional mask directory.

The handler img2img.img2img() switches on a mode parameter to pick the right combination of the above.

Inpainting

Two flavours of inpainting coexist:

Standard inpainting — the model is asked to denoise the masked region only, with the unmasked region kept fixed. The mask is used to blend latents at every step (mask, nmask in the sampler args).
Inpainting model inpainting — when the loaded checkpoint is an "inpainting model" (e.g., Stability's sd-v1-5-inpainting), it expects 9-channel input. The init step concatenates the masked image and the mask onto the latent. Detection logic is in processing.py — search for is_inpaint.

Soft inpainting is implemented as a builtin extension under extensions-builtin/soft-inpainting/. It hooks cfg_denoiser and blends predicted noise with the original noise at every step using a smooth mask falloff.

Outpainting

outpainting_mk_2.py is the more sophisticated outpainting script:

Pads the input image with the chosen colour or noise.
Builds a feathered mask covering the new pixels.
Generates a Mirror-extended noise pattern in the new area (the "mk_2" trick from parlance-zz/g-diffuser-bot).
Runs img2img inpainting with that as the starting latent.

poor_mans_outpainting.py is the older, simpler version: just inpainting with no clever noise.

Loopback

scripts/loopback.py runs img2img N times, feeding each output back as the next input. The denoising strength can be ramped up or down across iterations (the "Final denoising strength" + "Denoising strength curve" controls). Useful for slow style drift / animation frames.

SD upscale

scripts/sd_upscale.py tile-img2img-upscales a large image: split into overlapping tiles, run img2img on each with low denoising, reassemble. Sometimes confused with the upscaler in the Extras tab; this one is conceptually closer to "re-render at higher resolution".

Color correction

When opts.img2img_color_correction is on, setup_color_correction (in processing.py) computes a histogram match target from the input image and apply_color_correction is called after sampling so the output preserves the input's colour distribution. Useful for loopback to avoid colour drift.

Mask overlay and blending

After sampling, if a mask was used, apply_overlay() composites the freshly generated region into the original image using the mask. The mask_blur setting controls the feather; inpaint_full_res re-uses uncrop to put a "only masked" tile back into the original resolution.

Integration points

The same Script and callback system as txt2img — most extensions work in both tabs by checking is_img2img in Script.show().
ControlNet and similar tools generally hook here via cfg_denoiser and don't need img2img-specific code.
Scripts that only make sense for img2img set Script.is_txt2img = False. Inverse via is_img2img = False.

Entry points for modification

Custom mask logic — edit init() of StableDiffusionProcessingImg2Img, but consider an alwayson script instead.
Per-image preprocessing — the before_process hook of an alwayson script can mutate p.init_images before the encoder runs.
A new img2img mode — add a new sub-tab in ui.py and a new mode value handled by img2img.img2img().
Mask compositing — apply_overlay is the choke point. Soft inpainting overrides this through the cfg_denoiser path; copy that pattern.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.