Qwen 3 Coder (30B-A3B MoE) has 30.5 billion parameters. At standard 4-bit quantization with 8K context, it needs roughly 22.4 GB of VRAM — weights plus cache and runtime overhead.
VRAM by quantization
| Precision | Weights | Cache/Buffer | Total VRAM |
|---|---|---|---|
| 2-bit (IQ2_XXS) | 9.8 GB | 3.7 GB | 15.4 GB |
| 4-bit (Q4_K_M) | 16.8 GB | 3.7 GB | 22.4 GB |
| 8-bit (Q8_0) | 32.0 GB | 3.7 GB | 37.7 GB |
| 16-bit (FP16) | 61.0 GB | 3.7 GB | 66.7 GB |
Which GPU can run Qwen 3 Coder (30B-A3B MoE) (at 4-bit)?
| GPU class | VRAM | Qwen 3 Coder (30B-A3B MoE) (22.4 GB) |
|---|---|---|
| 8 GB · RTX 5060 / 4060 | 8 GB | Won’t fit |
| 12 GB · RTX 5070 / 3060 | 12 GB | Won’t fit |
| 16 GB · RTX 5070 Ti / 4080 | 16 GB | Won’t fit |
| 24 GB · RTX 4090 / 3090 | 24 GB | Fits |
| 32 GB · RTX 5090 | 32 GB | Fits |
| 48 GB · 2×24 / RTX 6000 Ada | 48 GB | Fits |
| 128 GB · M-series / RTX Spark | 128 GB | Fits |
Sparse MoE coding engine (3B active). Very fast locally.
Get the exact number for your setup
Pick your model, quantization, and context length — the calculator shows the full VRAM math and tells you precisely which hardware fits.
Open the Local AI Calculator →
VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of 2026-06-15.