AUTOMATIC1111/stable-diffusion-webui
Training (Textual Inversion and Hypernetworks)
Active contributors: AUTOMATIC1111, DepFA, AngelBottomless, Kohaku-Blueleaf
Purpose
Two in-tree training pipelines: textual inversion embeddings and hypernetworks. Both let users adapt a frozen Stable Diffusion checkpoint without fine-tuning the full model. Lora training is not in this codebase; users typically use kohya_ss/sd-scripts for that and load the result through the Lora extension.
The training UI is a sub-tab of the Train top-level tab.
Code layout
modules/
├── textual_inversion/
│ ├── textual_inversion.py # the trainer; ~770 lines
│ ├── dataset.py # PIL-based image+caption dataset
│ ├── learn_schedule.py # learning rate scheduling DSL
│ ├── image_embedding.py # encode/decode embeddings as PNG sidecars
│ ├── autocrop.py # face-aware preprocessing
│ ├── saving_settings.py # per-embedding save metadata
│ └── ui.py # the Train tab create_embedding helper
├── hypernetworks/
│ ├── hypernetwork.py # ~36 KB; trainer + module + UI registration
│ └── ui.py # create_hypernetwork helper
├── api/api.py # /sdapi/v1/train/* and /create/* endpoints
└── ui.py # the Train tab assembly
textual_inversion_templates/ # prompt templates used during training (e.g. "subject_filewords.txt")
embeddings/ # output destination for trained embeddings
models/hypernetworks/ # output destination for hypernetsWhat can be trained
| Concept | What it actually is | File extension | Inference path |
|---|---|---|---|
| Textual inversion embedding | A single learned token vector (or a list of them, one per "vector per token") that maps to a specific concept. | .pt, .safetensors, .bin |
Substituted into CLIP at encode time by EmbeddingDatabase (see systems/sd-hijack.md). |
| Hypernetwork | A small MLP whose output is added to the K and V projections in the UNet's cross-attention layers. | .pt |
Patched in by Hypernetwork.attach() and applied via extra_networks_hypernet. |
Lifecycle: textual inversion
sequenceDiagram
participant User
participant CreateUI as Train tab Create
participant Trainer as textual_inversion.train_embedding
participant Dataset
participant CLIP
participant Optim as torch.optim
User->>CreateUI: name, init text, vectors per token
CreateUI->>Trainer: create_embedding(...)
Trainer-->>CreateUI: empty .pt in embeddings/
User->>CreateUI: select images dir, training settings, click Train
CreateUI->>Trainer: train_embedding(args)
Trainer->>Dataset: PersonalizedBase
loop steps
Dataset->>Trainer: image + caption template
Trainer->>CLIP: encode prompt (with embedding token)
Trainer->>UNet: noise prediction
Trainer->>Optim: backprop only the embedding tensor
Trainer->>Trainer: write_image_embedding (preview every N)
Trainer->>Trainer: save .pt every N
endKey implementation points in modules/textual_inversion/textual_inversion.py:
Embeddingis the in-memory record (vec,name,step,sd_checkpoint).EmbeddingDatabasewalks the configured directories at boot and on/sdapi/v1/refresh-embeddings.train_embedding(...)is the trainer; ~330 lines. It supports gradient accumulation, EMA, learning rate scheduling, and cross-attention masking.create_embedding(name, num_vectors_per_token, ...)initialises a new.ptfrom an init text or random vectors.- Image previews during training are saved into
textual_inversion/<embedding>/; they share the embedded-as-PNG format implemented inimage_embedding.py.
Lifecycle: hypernetworks
modules/hypernetworks/hypernetwork.py is one of the largest single files in the repo (~36 KB) — it contains both the model definition (HypernetworkModule, Hypernetwork) and the training loop (train_hypernetwork()).
The HypernetworkModule is a small linear stack inserted into each cross-attention K/V projection. It can be tanh-activated, dropout'd, layer-normed; the user picks the architecture string at creation time. Training ratio of 1:1 between the hypernet and CLIP/Lora is unusual: it produces relatively heavy weights for fairly subtle effects.
Hypernetworks are deprecated in favour of Lora, but the code still works. The last meaningful change in this file was a v1.7-era stability fix.
Dataset and templates
PersonalizedBase(modules/textual_inversion/dataset.py) reads images and (optional) per-image caption files from a directory. It supports cropping, mirroring, and on-the-fly tagging via deepdanbooru/BLIP if the image has no caption.textual_inversion_templates/holds prompt templates with placeholders:[name]is replaced with the embedding name,[filewords]with the per-image caption.autocrop.pyis a focal-point detector that can pick a crop centred on a face/feature. Used during preprocessing.
The Preprocess tab
A separate sub-tab under Train preprocesses an image directory before training: resize/crop, mirror, autotag with BLIP (caption) or deepdanbooru (anime tags), split into training and validation. Implemented inline in modules/ui.py and modules/textual_inversion/preprocess.py (lazy-loaded).
API
| Endpoint | Action |
|---|---|
POST /sdapi/v1/create/embedding |
Create a new empty embedding file |
POST /sdapi/v1/create/hypernetwork |
Create a new empty hypernetwork |
POST /sdapi/v1/train/embedding |
Train an embedding |
POST /sdapi/v1/train/hypernetwork |
Train a hypernetwork |
GET /sdapi/v1/embeddings |
List embeddings (loaded + skipped) |
POST /sdapi/v1/refresh-embeddings |
Rescan the embeddings directories |
GET /sdapi/v1/hypernetworks |
List hypernetworks |
POST /sdapi/v1/preprocess |
Preprocess a folder (legacy; resizes/captions images) |
The training endpoints block until training finishes — they are intended to be used asynchronously alongside /sdapi/v1/progress.
What this code does not do
- No DreamBooth / full fine-tuning. This codebase only trains embeddings or hypernetworks. Use a separate trainer like
kohya_ss/sd-scriptsfor Lora or DreamBooth. - No SDXL textual inversion training. The trainer was written for SD 1.x's CLIP encoder. SDXL has two text encoders, and the in-tree trainer doesn't address that — embeddings can still be loaded, just not trained here.
- No multi-GPU. The training loop is single-process, single-device.
Integration points
script_callbacks.on_ui_train_tabs(callback)lets extensions add their own train sub-tabs. Used by some Lora-trainer extensions.- The
EmbeddingDatabaseis onmodel_hijack, soon_model_loadedis the right callback for "react to embeddings being available".
Entry points for modification
- Embeddings as
.safetensors— already supported on the load side; saving usessafetensors.torch.save_file. To change save format, editEmbedding.save()inmodules/textual_inversion/textual_inversion.py. - Better LR schedules —
learn_schedule.pyparses an Automatic1111-specific DSL like0.005:100, 0.001:500. Extending the grammar lives there. - Logging — TensorBoard logging is available behind an opt-in setting; the integration is minimal and routed through
tensorboard_setupin the trainer.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
Hires fix and upscaling workflow
Next
Built-in scripts