RTX 5080 — LLM benchmarks
No benchmarks on RTX 5080 yet.
No RTX 5080 benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on RTX 5080
39 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 85%
66.00tok/s — qwen3-coder on RTX 5080 via llama.cpp Q4_K_M
“I get around ~66 t/s (16k/32k context, Q4_K_M) with very similar but Notebook hardware: AMD Ryzen 9955HX3D 64GB DDR5-5600 Nvidia RTX 5080 Mobile 16”
- communityconfidence 70%
25.00tok/s — Qwen3-Coder on RTX 5080 via lm-studio
“Running Qwen3-Coder 30B at Full 256K Context: 25 tok/s with 96GB RAM + RTX 5080 Hello, I come to share with you my happiness running Qwen3-Coder 30B at its maximum unstretched context (256K).”
- communityconfidence 70%
25.00tok/s — Qwen3-Coder on RTX 5080 via lm-studio
“Running Qwen3-Coder 30B at Full 256K Context: 25 tok/s with 96GB RAM + RTX 5080 Hello, I come to share with you my happiness running Qwen3-Coder 30B at its maximum unstretched context (256K).”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
- communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
Common questions about RTX 5080
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.