"DeepSeek-R1" is not one model — it's a family that spans from something that runs on a phone to something that needs a server rack. The single most common mistake is downloading the wrong size. Here's the VRAM each one actually needs, and which size you probably want.
Distill vs. full: know which you're downloading
The lightweight R1 Distill models (1.5B, 7B, 8B, 14B, 32B, 70B) are existing Qwen and Llama models fine-tuned on R1's chain-of-thought. They're what most people run locally. The full DeepSeek-R1 is a 671B-parameter Mixture-of-Experts model with ~37B active parameters — frontier-class, but firmly server territory. If a guide says "R1 runs on a 3060," they mean a distill.
VRAM by size (4-bit, 8K context)
| Model | Weights (Q4) | Total VRAM | Fits on |
|---|---|---|---|
| R1 Distill 1.5B | ~0.8 GB | ~3 GB | Any GPU / phone |
| R1 Distill 7B / 8B | ~4.4 GB | ~7–8 GB | 8 GB GPU |
| R1 Distill 14B | ~7.7 GB | ~12 GB | 12 GB GPU |
| R1 Distill 32B | ~17.6 GB | ~24 GB | 24 GB GPU (tight) |
| R1 Distill 70B | ~38.8 GB | ~48 GB | 2× 24 GB / 48 GB |
| R1 Full 671B (MoE) | ~370 GB | Server | Multi-GPU node |
Context length is the hidden cost
R1 models "think" out loud, so they burn through context fast. Doubling your context window from 8K to 16K roughly doubles the KV-cache portion of the budget. On a 32B distill that's the difference between a comfortable 24 GB fit and an uncomfortable one. If you plan to feed it long documents, size up your VRAM or step down the model.
Want the exact number for the size and context you have in mind? The calculator lets you pick any R1 distill, set the quant and context, and tells you whether it fits, gets tight, or needs offload.
FAQ
We may partner with companies or groups to affiliate hardware products based on user needs, earning a commission from qualifying purchases. VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of June 2026.