AUTOMATIC1111/stable-diffusion-webui

Deployment

How the project is intended to be run, and what to consider when running it more seriously than "double-click webui.bat".

The project does not ship Dockerfiles, Helm charts, systemd units, or cloud templates. Deployment is essentially "set up Python, run webui.sh". This page documents the supported topologies and the knobs that matter for non-trivial setups.

Supported runtimes

Backend	Trigger	Notes
NVIDIA CUDA	Default; `--xformers` for memory-efficient attention	Tested most heavily
AMD ROCm (Linux)	`TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7"`	Set in `webui-user.sh`
AMD DirectML (Windows)	Not supported in the master branch — users typically fork `lshqqytiger/stable-diffusion-webui-directml`	–
Apple Silicon (MPS)	`webui-macos-env.sh` sets `PYTORCH_ENABLE_MPS_FALLBACK=1` and pinned torch	Some operations fall back to CPU
Intel IPEX (Arc, Iris)	`--use-ipex`	Code in `modules/xpu_specific.py`
Ascend NPU	`requirements_npu.txt` swaps in the NPU torch	Code in `modules/npu_specific.py`
CPU only	`--use-cpu all --no-half --skip-torch-cuda-test`	Slow; only useful for tests and tiny models

The launcher is responsible for installing the right torch based on environment variables (TORCH_COMMAND, XFORMERS_PACKAGE, CLIP_PACKAGE). See modules/launch_utils.py prepare_environment().

Running headless

For pure-API deployments:

python launch.py --nowebui --api --listen --port 7861 --api-auth user:secret

--nowebui skips Gradio entirely. webui.py:api_only() runs the FastAPI app directly with uvicorn. Be sure to add --api-auth because the API has no built-in unauthenticated rate limiting.

For shared/UI deployments, --listen --port 7860 --gradio-auth user:pass --enable-insecure-extension-access=false is the minimum setup before exposing externally. Don't expose the UI to the internet without auth and CORS scoping.

What runs where

Single process. Long-running. The model lives in GPU/CPU memory the whole time.
Gradio thread pool serves HTTP. Generation work is serialised on queue_lock (see modules/call_queue.py) so only one batch at a time uses the GPU.
Disk usage: models/ (checkpoints, ~2–10 GB each), outputs/ (generated images), embeddings/, models/Lora/, models/VAE/, models/ControlNet/ (extension), and repositories/ (vendored upstream repos).
Logs go to stdout / stderr. If you redirect to a file, set --loglevel and consider --log-startup for boot diagnostics.

Reverse proxies

For external exposure put the webui behind nginx / Caddy / Traefik. A common configuration:

HTTPS termination at the proxy.
Long timeouts: image generation can take minutes per request. proxy_read_timeout 600; and proxy_send_timeout 600; on nginx.
WebSocket support: Gradio uses WebSockets for the queue + live preview. nginx needs proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "Upgrade";.
File upload size: image uploads can hit 50 MB+. client_max_body_size 100m;.
The --subpath flag is required if hosting under a path like /webui/. It rewrites Gradio's static asset URLs.
The --root-path env handling in FastAPI is automatic for the API mode.

--cors-allow-origins=https://my.domain (or the regex variant) lets browser clients on a different host call the API.

Persistence

Almost everything is configurable but defaults to relative paths in the repo:

--data-dir — points config.json, ui-config.json, outputs/, log/, etc., elsewhere. Useful for separating volatile user data from the read-only code.
--models-dir — root for Stable-diffusion/, VAE/, Lora/, etc. Symlinks work fine if you have many extensions sharing model directories.
--ckpt-dir, --vae-dir, --embeddings-dir, --lora-dir (the last comes from the Lora extension's preload.py) override individual subdirectories.

For containerised deployment, mount --data-dir and --models-dir as volumes; bind-mount the repo at /opt/webui read-only.

Resource sizing

Rough VRAM minimums for a single image at default settings:

Model	Native	With `--medvram`	With `--lowvram`
SD 1.5 (512²)	4 GB	3 GB	2 GB
SD 1.5 + Lora	5 GB	4 GB	3 GB
SD 2.x (768²)	6 GB	5 GB	3 GB
SDXL (1024²)	10 GB	8 GB (`--medvram-sdxl`)	6 GB
SDXL + refiner	12 GB	8 GB	6 GB
SDXL + Lora + hires fix	14 GB+	10 GB	7 GB

Disk: ~20 GB for the base install + first checkpoint. Each additional checkpoint ~2–10 GB.

CPU: not a bottleneck during generation; model loading is the main CPU/IO load.

Updates

The intended update flow is git pull && python launch.py; the launcher reinstalls dependencies if requirements_versions.txt changed. There is no migration tooling — settings/config formats are kept compatible by hand.

Disabling extensions in shared environments

For multi-user or hosted deployments, lock down the extensions surface:

--disable-all-extensions all      # nothing user-installable, including built-ins
--disable-extra-extensions        # built-ins only; no user-installed extensions
--enable-insecure-extension-access=false   # default; users can't install via the Extensions tab

Combined with --api-auth and --gradio-auth, this is the minimum hardening for a public webui. See security.md.

Backups

Worth saving:

config.json and ui-config.json — settings and UI defaults.
cache.json — pre-computed model hashes (regenerated, but slow).
embeddings/ and models/ — user content.
outputs/ — generated images. The PNG-info infotext means images are self-describing; archives can be restored to any newer version.

The repository itself does not need backing up; it's regenerated from upstream.

Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.