What is the biggest LLM a Nvidia RTX 5060 can run?

At 4-bit, it comfortably runs models needing up to about 8 GB of VRAM. See the table for specific models.

What AI models run on a Nvidia RTX 5060? (8 GB, 2026)

The Nvidia RTX 5060 has 8 GB of memory. At 4-bit quantization (8K context), 38 of 83 popular local LLMs fit comfortably. Full list below, smallest first.

Model	Params	VRAM (Q4)	On 8 GB
nomic-embed-text v1.5 (137M)	0.137B	2.1 GB	Fits
jina-reranker-v2 (300M)	0.3B	2.2 GB	Fits
EmbeddingGemma (308M)	0.308B	2.2 GB	Fits
bge-large-en-v1.5 (335M)	0.335B	2.2 GB	Fits
bge-reranker-large (335M)	0.335B	2.2 GB	Fits
stella-en-400M (435M)	0.435B	2.3 GB	Fits
nomic-embed-text v2 MoE (475M)	0.475B	2.3 GB	Fits
Qwen 2.5 Coder (0.5B)	0.5B	2.3 GB	Fits
bge-m3 (567M)	0.567B	2.4 GB	Fits
jina-embeddings-v3 (570M)	0.57B	2.4 GB	Fits
Qwen 3 (0.6B)	0.6B	2.4 GB	Fits
Qwen 3 Embedding (0.6B)	0.6B	2.4 GB	Fits
Qwen 3 Reranker (0.6B)	0.6B	2.4 GB	Fits
Gemma 3 (1B)	1B	2.7 GB	Fits
DeepSeek-R1 Distill (1.5B)	1.5B	3.0 GB	Fits
Qwen 2.5 Coder (1.5B)	1.5B	3.0 GB	Fits
Qwen 3 (1.7B)	1.7B	3.1 GB	Fits
Gemma 3n (E2B)	2B	3.3 GB	Fits
SmolLM3 (3B)	3B	4.0 GB	Fits
Llama 3.2 (3B)	3B	4.0 GB	Fits
Qwen 2.5 Coder (3B)	3B	4.0 GB	Fits
StarCoder 2 (3B)	3B	4.0 GB	Fits
Qwen 2.5 VL (3B)	3B	4.0 GB	Fits
Phi-4 Mini (3.8B)	3.8B	4.5 GB	Fits
Gemma 3 (4B)	4B	4.7 GB	Fits
Qwen 3 (4B)	4B	4.7 GB	Fits
Qwen 3 VL (4B)	4B	4.7 GB	Fits
Phi-4 Multimodal (5.6B)	5.6B	5.8 GB	Fits
DeepSeek-R1 Distill (7B)	7B	6.7 GB	Fits
StarCoder 2 (7B)	7B	6.7 GB	Fits
Qwen 2.5 Coder (7B)	7.2B	6.8 GB	Fits
Qwen 2.5 VL (7B)	7.2B	6.8 GB	Fits
Llama 3.1 (8B)	8B	7.4 GB	Fits
DeepSeek-R1 Distill (8B)	8B	7.4 GB	Fits
Qwen 3 VL (8B)	8B	7.4 GB	Fits
InternVL3 (8B)	8B	7.4 GB	Fits
LLaVA 1.6 (8B)	8B	7.4 GB	Fits
Qwen 3 (8B)	8.2B	7.5 GB	Fits
Gemma 2 (9B)	9.2B	8.2 GB	Tight
Llama 3.2 Vision (11B)	11B	9.4 GB	Tight
Gemma 3 (12B)	12B	10.0 GB	Won’t fit
Pixtral (12B)	12B	10.0 GB	Won’t fit
DeepSeek-R1 Distill (14B)	14B	11.4 GB	Won’t fit
Phi-4 (14B)	14B	11.4 GB	Won’t fit
Qwen 3 (14B)	14.8B	11.9 GB	Won’t fit
Qwen 2.5 Coder (14B)	14.8B	11.9 GB	Won’t fit
StarCoder 2 (15B)	15B	12.1 GB	Won’t fit
DeepSeek-Coder-V2 Lite (16B)	16B	12.7 GB	Won’t fit
gpt-oss (20B MoE)	21B	16.1 GB	Won’t fit
Devstral Small (24B)	24B	18.1 GB	Won’t fit
Codestral 25.01 (24B)	24B	18.1 GB	Won’t fit
Gemma 4 (26B MoE)	26B	19.4 GB	Won’t fit
Gemma 3 (27B)	27B	20.1 GB	Won’t fit
Qwen 3.6 (27B)	27B	20.1 GB	Won’t fit
Qwen 3 (30B-A3B MoE)	30.5B	22.4 GB	Won’t fit
Qwen 3 Coder (30B-A3B MoE)	30.5B	22.4 GB	Won’t fit
DeepSeek-R1 Distill (32B)	32B	23.4 GB	Won’t fit
Qwen 3 VL (32B)	32B	23.4 GB	Won’t fit
Qwen 2.5 (32B)	32.5B	23.8 GB	Won’t fit
Qwen 2.5 Coder (32B)	32.5B	23.8 GB	Won’t fit
Qwen 3 (32B)	32.8B	24.0 GB	Won’t fit
Qwen 3.6 (35B-A3B MoE)	35B	25.4 GB	Won’t fit
Llama 3.3 (70B)	70.6B	49.3 GB	Won’t fit
DeepSeek-R1 (70B Distill)	70.6B	49.3 GB	Won’t fit
Qwen 2.5 VL (72B)	72B	50.2 GB	Won’t fit
InternVL3 (78B)	78B	54.3 GB	Won’t fit
Llama 3.2 Vision (90B)	90B	62.3 GB	Won’t fit
GLM-4.5 Air (106B-A12B)	106B	73.0 GB	Won’t fit
Llama 4 Scout (109B MoE)	109B	75.0 GB	Won’t fit
gpt-oss (120B MoE)	117B	80.4 GB	Won’t fit
Mistral Large 2 (123B)	123B	84.4 GB	Won’t fit
DeepSeek-Coder-V2 (236B)	236B	160.1 GB	Won’t fit
DeepSeek-V4-Flash (284B-A13B)	284B	192.3 GB	Won’t fit
GLM-4.6 (355B-A32B)	355B	239.9 GB	Won’t fit
Llama 4 Maverick (400B MoE)	400B	270.0 GB	Won’t fit
Llama 3.1 (405B)	405B	273.4 GB	Won’t fit
MiniMax M3 (428B-A23B MoE)	428B	288.8 GB	Won’t fit
Qwen 3 Coder (480B-A35B)	480B	323.6 GB	Won’t fit
DeepSeek-V3.1 (671B)	671B	451.6 GB	Won’t fit
DeepSeek-R1 (671B Full)	671B	451.6 GB	Won’t fit
GLM-5.2 (744B-A40B MoE)	744B	500.5 GB	Won’t fit
Kimi K2 (1T MoE)	1000B	672.0 GB	Won’t fit
DeepSeek-V4-Pro (1.6T MoE)	1600B	1074.0 GB	Won’t fit

"Tight" means it only fits with little headroom — close other GPU apps or expect some system-RAM offload. For models that won't fit, drop to a smaller model, use 2-bit, or step up VRAM.

Get the exact number for your setup

Pick your model, quantization, and context length — the calculator shows the full VRAM math and tells you precisely which hardware fits.

Open the Local AI Calculator →

VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of 2026-07-05.