"DeepSeek-R1" is not one model — it's a family that spans from something that runs on a phone to something that needs a server rack. The single most common mistake is downloading the wrong size. Here's the VRAM each one actually needs, and which size you probably want.

Distill vs. full: know which you're downloading

The lightweight R1 Distill models (1.5B, 7B, 8B, 14B, 32B, 70B) are existing Qwen and Llama models fine-tuned on R1's chain-of-thought. They're what most people run locally. The full DeepSeek-R1 is a 671B-parameter Mixture-of-Experts model with ~37B active parameters — frontier-class, but firmly server territory. If a guide says "R1 runs on a 3060," they mean a distill.

VRAM by size (4-bit, 8K context)

ModelWeights (Q4)Total VRAMFits on
R1 Distill 1.5B~0.8 GB~3 GBAny GPU / phone
R1 Distill 7B / 8B~4.4 GB~7–8 GB8 GB GPU
R1 Distill 14B~7.7 GB~12 GB12 GB GPU
R1 Distill 32B~17.6 GB~24 GB24 GB GPU (tight)
R1 Distill 70B~38.8 GB~48 GB2× 24 GB / 48 GB
R1 Full 671B (MoE)~370 GBServerMulti-GPU node
Sweet spot For most people on a single consumer GPU, the 14B distill (12 GB) or 32B distill (24 GB) is the best blend of reasoning quality and "fits on hardware you can actually buy."

Context length is the hidden cost

R1 models "think" out loud, so they burn through context fast. Doubling your context window from 8K to 16K roughly doubles the KV-cache portion of the budget. On a 32B distill that's the difference between a comfortable 24 GB fit and an uncomfortable one. If you plan to feed it long documents, size up your VRAM or step down the model.

Want the exact number for the size and context you have in mind? The calculator lets you pick any R1 distill, set the quant and context, and tells you whether it fits, gets tight, or needs offload.

Check your DeepSeek-R1 fit
Choose the R1 size, quantization, and context length — get the full weights + KV cache + overhead breakdown and a fits/won't-fit verdict.
Open the Local AI Calculator

FAQ

Which DeepSeek-R1 should I run on a 12 GB GPU?
The 14B distill at Q4 fits ~12 GB at 8K context. If you want headroom for longer context, the 8B distill is a safer choice.
Does R1 need more VRAM than a normal model its size?
The weights are the same as the base model, but R1's long reasoning chains mean you'll often want a larger context window, which increases the KV-cache cost.
Is the 70B distill worth it over the 32B?
Only if you already have ~48 GB of VRAM. The 32B distill is remarkably close in everyday reasoning and fits a single 24 GB card.
Related guides
Best GPU for Llama 3 70B Q4 vs Q8 Quantization Explained Cheapest PC to Run a Local LLM

We may partner with companies or groups to affiliate hardware products based on user needs, earning a commission from qualifying purchases. VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of June 2026.