The AMD Radeon RX 9060 XT (16GB) has 16 GB of memory. At 4-bit quantization (8K context), 48 of 78 popular local LLMs fit comfortably. Full list below, smallest first.

ModelParamsVRAM (Q4)On 16 GB
nomic-embed-text v1.5 (137M)0.137B2.1 GBFits
jina-reranker-v2 (300M)0.3B2.2 GBFits
EmbeddingGemma (308M)0.308B2.2 GBFits
bge-large-en-v1.5 (335M)0.335B2.2 GBFits
bge-reranker-large (335M)0.335B2.2 GBFits
stella-en-400M (435M)0.435B2.3 GBFits
nomic-embed-text v2 MoE (475M)0.475B2.3 GBFits
Qwen 2.5 Coder (0.5B)0.5B2.3 GBFits
bge-m3 (567M)0.567B2.4 GBFits
jina-embeddings-v3 (570M)0.57B2.4 GBFits
Qwen 3 (0.6B)0.6B2.4 GBFits
Qwen 3 Embedding (0.6B)0.6B2.4 GBFits
Qwen 3 Reranker (0.6B)0.6B2.4 GBFits
Gemma 3 (1B)1B2.7 GBFits
DeepSeek-R1 Distill (1.5B)1.5B3.0 GBFits
Qwen 2.5 Coder (1.5B)1.5B3.0 GBFits
Qwen 3 (1.7B)1.7B3.1 GBFits
Gemma 3n (E2B)2B3.3 GBFits
SmolLM3 (3B)3B4.0 GBFits
Llama 3.2 (3B)3B4.0 GBFits
Qwen 2.5 Coder (3B)3B4.0 GBFits
StarCoder 2 (3B)3B4.0 GBFits
Qwen 2.5 VL (3B)3B4.0 GBFits
Phi-4 Mini (3.8B)3.8B4.5 GBFits
Gemma 3 (4B)4B4.7 GBFits
Qwen 3 (4B)4B4.7 GBFits
Qwen 3 VL (4B)4B4.7 GBFits
Phi-4 Multimodal (5.6B)5.6B5.8 GBFits
DeepSeek-R1 Distill (7B)7B6.7 GBFits
StarCoder 2 (7B)7B6.7 GBFits
Qwen 2.5 Coder (7B)7.2B6.8 GBFits
Qwen 2.5 VL (7B)7.2B6.8 GBFits
Llama 3.1 (8B)8B7.4 GBFits
DeepSeek-R1 Distill (8B)8B7.4 GBFits
Qwen 3 VL (8B)8B7.4 GBFits
InternVL3 (8B)8B7.4 GBFits
LLaVA 1.6 (8B)8B7.4 GBFits
Qwen 3 (8B)8.2B7.5 GBFits
Gemma 2 (9B)9.2B8.2 GBFits
Llama 3.2 Vision (11B)11B9.4 GBFits
Gemma 3 (12B)12B10.0 GBFits
Pixtral (12B)12B10.0 GBFits
DeepSeek-R1 Distill (14B)14B11.4 GBFits
Phi-4 (14B)14B11.4 GBFits
Qwen 3 (14B)14.8B11.9 GBFits
Qwen 2.5 Coder (14B)14.8B11.9 GBFits
StarCoder 2 (15B)15B12.1 GBFits
DeepSeek-Coder-V2 Lite (16B)16B12.7 GBFits
gpt-oss (20B MoE)21B16.1 GBTight
Devstral Small (24B)24B18.1 GBWon’t fit
Codestral 25.01 (24B)24B18.1 GBWon’t fit
Gemma 4 (26B MoE)26B19.4 GBWon’t fit
Gemma 3 (27B)27B20.1 GBWon’t fit
Qwen 3 (30B-A3B MoE)30.5B22.4 GBWon’t fit
Qwen 3 Coder (30B-A3B MoE)30.5B22.4 GBWon’t fit
DeepSeek-R1 Distill (32B)32B23.4 GBWon’t fit
Qwen 3 VL (32B)32B23.4 GBWon’t fit
Qwen 2.5 (32B)32.5B23.8 GBWon’t fit
Qwen 2.5 Coder (32B)32.5B23.8 GBWon’t fit
Qwen 3 (32B)32.8B24.0 GBWon’t fit
Qwen 3.6 (35B-A3B MoE)35B25.4 GBWon’t fit
Llama 3.3 (70B)70.6B49.3 GBWon’t fit
DeepSeek-R1 (70B Distill)70.6B49.3 GBWon’t fit
Qwen 2.5 VL (72B)72B50.2 GBWon’t fit
InternVL3 (78B)78B54.3 GBWon’t fit
Llama 3.2 Vision (90B)90B62.3 GBWon’t fit
GLM-4.5 Air (106B-A12B)106B73.0 GBWon’t fit
Llama 4 Scout (109B MoE)109B75.0 GBWon’t fit
gpt-oss (120B MoE)117B80.4 GBWon’t fit
Mistral Large 2 (123B)123B84.4 GBWon’t fit
DeepSeek-Coder-V2 (236B)236B160.1 GBWon’t fit
GLM-4.6 (355B-A32B)355B239.9 GBWon’t fit
Llama 4 Maverick (400B MoE)400B270.0 GBWon’t fit
Llama 3.1 (405B)405B273.4 GBWon’t fit
Qwen 3 Coder (480B-A35B)480B323.6 GBWon’t fit
DeepSeek-V3.1 (671B)671B451.6 GBWon’t fit
DeepSeek-R1 (671B Full)671B451.6 GBWon’t fit
Kimi K2 (1T MoE)1000B672.0 GBWon’t fit

"Tight" means it only fits with little headroom — close other GPU apps or expect some system-RAM offload. For models that won't fit, drop to a smaller model, use 2-bit, or step up VRAM.

Get the exact number for your setup
Pick your model, quantization, and context length — the calculator shows the full VRAM math and tells you precisely which hardware fits.
Open the Local AI Calculator
Related guides
Best GPU for Llama 3 70B How Much VRAM for DeepSeek-R1 Q4 vs Q8 Quantization Explained Apple Silicon for Local AI RTX Spark: 128GB Unified Memory

VRAM figures are reproducible estimates (weights + KV cache + overhead) and vary by runtime and quant format. Data current as of 2026-06-15.