Qwen 3.6 (35B-A3B MoE) has 35 billion parameters. At standard 4-bit quantization with 8K context, it needs roughly 25.4 GB of VRAM — weights plus cache and runtime overhead.

VRAM by quantization

PrecisionWeightsCache/BufferTotal VRAM
2-bit (IQ2_XXS)11.2 GB4.2 GB17.4 GB
4-bit (Q4_K_M)19.3 GB4.2 GB25.4 GB
8-bit (Q8_0)36.8 GB4.2 GB43.0 GB
16-bit (FP16)70.0 GB4.2 GB76.2 GB

Which GPU can run Qwen 3.6 (35B-A3B MoE) (at 4-bit)?

GPU classVRAMQwen 3.6 (35B-A3B MoE) (25.4 GB)
8 GB · RTX 5060 / 40608 GBWon’t fit
12 GB · RTX 5070 / 306012 GBWon’t fit
16 GB · RTX 5070 Ti / 408016 GBWon’t fit
24 GB · RTX 4090 / 309024 GBTight
32 GB · RTX 509032 GBFits
48 GB · 2×24 / RTX 6000 Ada48 GBFits
128 GB · M-series / RTX Spark128 GBFits

Alibaba's mid-2026 sparse MoE with 262K native context.

Get the exact number for your setup
Pick your model, quantization, and context length — the calculator shows the full VRAM math and tells you precisely which hardware fits.
Open the Local AI Calculator
Related guides
Best GPU for Llama 3 70B How Much VRAM for DeepSeek-R1 Q4 vs Q8 Quantization Explained Apple Silicon for Local AI RTX Spark: 128GB Unified Memory

VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of 2026-06-15.