Image models play by slightly different rules than LLMs: instead of a giant KV cache, the VRAM hit comes from the model itself plus the resolution you're generating at. The good news is SDXL is friendlier than its reputation. The bad news is FLUX earns its reputation.
The quick reference
| Model | Comfortable VRAM | Minimum (with tricks) |
|---|---|---|
| Stable Diffusion 1.5 | 4–6 GB | Runs on almost anything |
| SDXL (6.6B) | 8–12 GB | ~6 GB with offload |
| SD 3.5 Large (8B) | 12–16 GB | ~8 GB quantized |
| FLUX.1 Schnell / Dev (12B) | 16–24 GB | ~8–12 GB quantized + offload |
| FLUX.2 Dev (32B) | 24 GB+ | Heavy — offload or quantize hard |
Why FLUX is so much hungrier
FLUX.1 is a 12B-parameter transformer — roughly 8× the size of SD 1.5 — and ships at high precision. At full fidelity the model plus its text encoders can ask for 20 GB or more. The community has done heroic work shrinking it: 8-bit and GGUF-quantized FLUX builds, plus CPU offload of the text encoder, bring it down to 12 GB and even 8 GB cards, at the cost of speed and a little fidelity.
Resolution is the other dial
Generating at 1024×1024 is the baseline. Push to 1536² or run a hi-res upscale pass and VRAM climbs sharply — an upscale can add several gigabytes on its own. If you're on a tight card, generate at standard resolution and upscale as a separate, lighter step rather than in one giant pass.
FAQ
We may partner with companies or groups to affiliate hardware products based on user needs, earning a commission from qualifying purchases. Image-model VRAM varies widely with pipeline, precision, offload settings, and resolution; figures are practical estimates. Data current as of June 2026.