What this Local AI calculator does
Running models locally is bottlenecked by one thing more than any other: VRAM. This tool tells you, for a specific model and quantization, exactly how much memory you need and which GPU or Mac actually fits it — no guesswork, no "it depends." Pick an LLM (Llama, Qwen, DeepSeek-R1, gpt-oss), an image model (Stable Diffusion, SDXL, FLUX), or set a custom parameter size, and it returns a clear fits / tight / won't-fit verdict.
How the VRAM math works
The estimate is transparent and reproducible: model weights + KV cache + runtime overhead. Weights scale with parameters and quantization (roughly params × 0.55 at 4-bit, × 1.05 at 8-bit); the KV cache grows with context length; and a fixed ~2 GB covers the runtime and OS. We show every term so you can see why a number is what it is.
Frequently asked questions
Related reading: Q4 vs Q8 quantization, the best GPU for Llama 3 70B, and VRAM for DeepSeek-R1.
Fuentes
Especificaciones de hardware y referencias de benchmarks y precios que usa esta herramienta: