AUTOMATIC1111/stable-diffusion-webui
Deployment
How the project is intended to be run, and what to consider when running it more seriously than "double-click webui.bat".
The project does not ship Dockerfiles, Helm charts, systemd units, or cloud templates. Deployment is essentially "set up Python, run webui.sh". This page documents the supported topologies and the knobs that matter for non-trivial setups.
Supported runtimes
| Backend | Trigger | Notes |
|---|---|---|
| NVIDIA CUDA | Default; --xformers for memory-efficient attention |
Tested most heavily |
| AMD ROCm (Linux) | TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.7" |
Set in webui-user.sh |
| AMD DirectML (Windows) | Not supported in the master branch — users typically fork lshqqytiger/stable-diffusion-webui-directml |
– |
| Apple Silicon (MPS) | webui-macos-env.sh sets PYTORCH_ENABLE_MPS_FALLBACK=1 and pinned torch |
Some operations fall back to CPU |
| Intel IPEX (Arc, Iris) | --use-ipex |
Code in modules/xpu_specific.py |
| Ascend NPU | requirements_npu.txt swaps in the NPU torch |
Code in modules/npu_specific.py |
| CPU only | --use-cpu all --no-half --skip-torch-cuda-test |
Slow; only useful for tests and tiny models |
The launcher is responsible for installing the right torch based on environment variables (TORCH_COMMAND, XFORMERS_PACKAGE, CLIP_PACKAGE). See modules/launch_utils.py prepare_environment().
Running headless
For pure-API deployments:
python launch.py --nowebui --api --listen --port 7861 --api-auth user:secret--nowebui skips Gradio entirely. webui.py:api_only() runs the FastAPI app directly with uvicorn. Be sure to add --api-auth because the API has no built-in unauthenticated rate limiting.
For shared/UI deployments, --listen --port 7860 --gradio-auth user:pass --enable-insecure-extension-access=false is the minimum setup before exposing externally. Don't expose the UI to the internet without auth and CORS scoping.
What runs where
- Single process. Long-running. The model lives in GPU/CPU memory the whole time.
- Gradio thread pool serves HTTP. Generation work is serialised on
queue_lock(seemodules/call_queue.py) so only one batch at a time uses the GPU. - Disk usage:
models/(checkpoints, ~2–10 GB each),outputs/(generated images),embeddings/,models/Lora/,models/VAE/,models/ControlNet/(extension), andrepositories/(vendored upstream repos). - Logs go to stdout / stderr. If you redirect to a file, set
--logleveland consider--log-startupfor boot diagnostics.
Reverse proxies
For external exposure put the webui behind nginx / Caddy / Traefik. A common configuration:
- HTTPS termination at the proxy.
- Long timeouts: image generation can take minutes per request.
proxy_read_timeout 600;andproxy_send_timeout 600;on nginx. - WebSocket support: Gradio uses WebSockets for the queue + live preview. nginx needs
proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "Upgrade";. - File upload size: image uploads can hit 50 MB+.
client_max_body_size 100m;. - The
--subpathflag is required if hosting under a path like/webui/. It rewrites Gradio's static asset URLs. - The
--root-pathenv handling in FastAPI is automatic for the API mode.
--cors-allow-origins=https://my.domain (or the regex variant) lets browser clients on a different host call the API.
Persistence
Almost everything is configurable but defaults to relative paths in the repo:
--data-dir— pointsconfig.json,ui-config.json,outputs/,log/, etc., elsewhere. Useful for separating volatile user data from the read-only code.--models-dir— root forStable-diffusion/,VAE/,Lora/, etc. Symlinks work fine if you have many extensions sharing model directories.--ckpt-dir,--vae-dir,--embeddings-dir,--lora-dir(the last comes from the Lora extension'spreload.py) override individual subdirectories.
For containerised deployment, mount --data-dir and --models-dir as volumes; bind-mount the repo at /opt/webui read-only.
Resource sizing
Rough VRAM minimums for a single image at default settings:
| Model | Native | With --medvram |
With --lowvram |
|---|---|---|---|
| SD 1.5 (512²) | 4 GB | 3 GB | 2 GB |
| SD 1.5 + Lora | 5 GB | 4 GB | 3 GB |
| SD 2.x (768²) | 6 GB | 5 GB | 3 GB |
| SDXL (1024²) | 10 GB | 8 GB (--medvram-sdxl) |
6 GB |
| SDXL + refiner | 12 GB | 8 GB | 6 GB |
| SDXL + Lora + hires fix | 14 GB+ | 10 GB | 7 GB |
Disk: ~20 GB for the base install + first checkpoint. Each additional checkpoint ~2–10 GB.
CPU: not a bottleneck during generation; model loading is the main CPU/IO load.
Updates
The intended update flow is git pull && python launch.py; the launcher reinstalls dependencies if requirements_versions.txt changed. There is no migration tooling — settings/config formats are kept compatible by hand.
Disabling extensions in shared environments
For multi-user or hosted deployments, lock down the extensions surface:
--disable-all-extensions all # nothing user-installable, including built-ins
--disable-extra-extensions # built-ins only; no user-installed extensions
--enable-insecure-extension-access=false # default; users can't install via the Extensions tabCombined with --api-auth and --gradio-auth, this is the minimum hardening for a public webui. See security.md.
Backups
Worth saving:
config.jsonandui-config.json— settings and UI defaults.cache.json— pre-computed model hashes (regenerated, but slow).embeddings/andmodels/— user content.outputs/— generated images. The PNG-info infotext means images are self-describing; archives can be restored to any newer version.
The repository itself does not need backing up; it's regenerated from upstream.
Built by Factory AutoWiki from public repository content. It is a generated preview for codebase exploration, not source-maintained documentation.
Previous
API
Next
Security