What this Local AI calculator does

Running models locally is bottlenecked by one thing more than any other: VRAM. This tool tells you, for a specific model and quantization, exactly how much memory you need and which GPU or Mac actually fits it — no guesswork, no "it depends." Pick an LLM (Llama, Qwen, DeepSeek-R1, gpt-oss), an image model (Stable Diffusion, SDXL, FLUX), or set a custom parameter size, and it returns a clear fits / tight / won't-fit verdict.

How the VRAM math works

The estimate is transparent and reproducible: model weights + KV cache + runtime overhead. Weights scale with parameters and quantization (roughly params × 0.55 at 4-bit, × 1.05 at 8-bit); the KV cache grows with context length; and a fixed ~2 GB covers the runtime and OS. We show every term so you can see why a number is what it is.

Frequently asked questions

How much VRAM do I need to run a local LLM?
At 4-bit, a rough rule is VRAM ≈ parameters (billions) × 0.55, plus KV cache and ~2 GB overhead. A 7–8B model fits in 8 GB; a 70B model needs ~48 GB. The calculator gives the exact figure for your model, quant, and context length.
How accurate are these VRAM estimates?
They're reproducible estimates of weights + KV cache + overhead. Real usage varies a little by inference engine (llama.cpp, vLLM) and quant format (GGUF, GPTQ, AWQ), but they're accurate enough to buy hardware confidently.
Does it cover image and video models too?
Yes — Stable Diffusion, SDXL, and FLUX image generation and audio models are included, each with its own memory profile based on resolution or clip length.

Related reading: Q4 vs Q8 quantization, the best GPU for Llama 3 70B, and VRAM for DeepSeek-R1.