Factory.ai

Open-Source Wikis

/

Stable Diffusion WebUI

/

Features

/

Text-to-image (txt2img)

AUTOMATIC1111/stable-diffusion-webui

Text-to-image (txt2img)

Active contributors: AUTOMATIC1111, w-e-w, catboxanon, missionfloyd

Purpose

The flagship workflow: user supplies a prompt, model generates an image from pure noise. txt2img is the simplest path through the pipeline; everything else is a variation on top.

Where the code lives

Concern File
The Generate button handler modules/txt2img.py
The tab UI construction modules/ui.py (the with gr.Tab("txt2img") block)
The processing dataclass StableDiffusionProcessingTxt2Img in modules/processing.py
The API route text2imgapi in modules/api/api.py
Pydantic request/response StableDiffusionTxt2ImgProcessingAPI, TextToImageResponse in modules/api/models.py

Request lifecycle

sequenceDiagram
    participant User
    participant Tab as txt2img tab
    participant Handler as txt2img.txt2img()
    participant Lock as queue_lock
    participant P as StableDiffusionProcessingTxt2Img
    participant Pipe as process_images
    participant Save as images.save_image

    User->>Tab: click Generate
    Tab->>Handler: wrap_gradio_gpu_call wrapper
    Handler->>Lock: acquire
    Handler->>P: build with all controls
    Handler->>Pipe: process_images(p)
    Pipe->>Save: per image
    Save-->>Pipe: file path
    Pipe-->>Handler: Processed
    Handler->>Lock: release
    Handler-->>Tab: gallery + infotexts + JSON

modules/txt2img.py is small (~120 lines). It receives the long *args tuple from Gradio, slices off the script-args portion, builds a StableDiffusionProcessingTxt2Img and calls process_images(p).

The same handler is reused by the Generate button and by the API endpoint — text2imgapi constructs the same dataclass directly from the Pydantic request body.

What txt2img-specific fields exist

StableDiffusionProcessingTxt2Img (in modules/processing.py around line 1166) adds these fields on top of the base StableDiffusionProcessing:

  • enable_hr, denoising_strength, hr_scale, hr_upscaler, hr_second_pass_steps, hr_resize_x, hr_resize_y, hr_sampler_name, hr_scheduler, hr_prompt, hr_negative_prompt, hr_cfg, hr_distilled_cfg — the hires-fix knobs. See hires-fix.md.
  • firstphase_width / firstphase_height — legacy aliases for the low-res pass dimensions.
  • The sample() method override, which runs the sampler once at low res and (if enabled) again at high res.

The base class fields (prompt, sampler_name, steps, cfg_scale, seed, batch_size, n_iter, width, height, scripts, …) are documented inline in processing.py.

What the user sees

The txt2img tab layout is built in modules/ui.py:

Section Source
Prompt + negative prompt + Generate button Toprow from modules/ui_toprow.py
Sampler + scheduler dropdowns rendered by the sampler alwayson script in modules/processing_scripts/sampler.py
Steps, CFG scale, width, height, batch direct Gradio sliders in ui.py
Seed and subseed controls rendered by processing_scripts/seed.py
Refiner accordion processing_scripts/refiner.py
Hires fix accordion inline in ui.py, controls listed above
Script dropdown the ScriptRunner for txt2img
Always-on script panels each script's ui() is rendered below
Output gallery + Save / Send to img2img / etc. create_output_panel in modules/ui_common.py

Infotext round-trip

When the user pastes generation parameters (a string like prompt\nNegative prompt: ...\nSteps: 20, Sampler: Euler a, ...), modules/infotext_utils.py parses each Key: value pair and assigns it to the matching (component, key) pair from the txt2img tab's paste_fields. The paste_fields list is registered alongside the controls in ui.py. Extensions can append to this list to participate in the round-trip — most do via Script.infotext_fields = [(component, "My Setting"), ...].

script_callbacks.on_infotext_pasted(callback) is the catch-all hook for "infotext arrived; do something with it before the components are filled".

Special operations

  • Skip / Interrupt — the toprow buttons set shared.state.skipped / shared.state.interrupted. The sampler checks these every step and exits early.
  • Live preview — every N steps, the latents are decoded with the approximate VAE (modules/sd_vae_approx.py) or TAESD (modules/sd_vae_taesd.py) and pushed to the UI through progress.py.
  • Extra noiseopts.eta_noise_seed_delta and the corresponding extra_noise callback let extensions inject custom noise (used by some adversarial tools).
  • Generate forever — a UI-only loop driven by JavaScript that re-clicks Generate until the user clicks Cancel.

API parity

POST /sdapi/v1/txt2img accepts the same parameters as the UI plus a few extras (script_args, script_name, alwayson_scripts). The default response includes the generated images base64-encoded plus a stringified parameters for round-tripping. The API and UI feed into the exact same process_images(p) call, so behaviour is identical.

The Pydantic models in modules/api/models.py are partly auto-derived from StableDiffusionProcessingTxt2Img field annotations — adding a new field to the dataclass usually exposes it through the API automatically.

Entry points for modification

  • Tweak the default sampler/steps — Settings → Sampler parameters / Defaults. Or set them in webui-user.sh via the default_* script callbacks.
  • Add a control to the tab — register an alwayson script (see scripts-and-extensions.md) rather than editing ui.py.
  • Validate the input differently — extend the setup hook on a script.
  • Change the underlying loop — the loop is in process_images_inner (see systems/processing.md). Most "I want to do X every step" features should use cfg_denoiser callbacks instead.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.

Text-to-image (txt2img) – Stable Diffusion WebUI wiki | Factory