RTX 5080 — LLM benchmarks

Name: RTX 5080 — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: RTX 5080, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on RTX 5080 yet.

No RTX 5080 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on RTX 5080

39 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 85%
66.00tok/s — qwen3-coder on RTX 5080 via llama.cpp Q4_K_M
“I get around ~66 t/s (16k/32k context, Q4_K_M) with very similar but Notebook hardware: AMD Ryzen 9955HX3D 64GB DDR5-5600 Nvidia RTX 5080 Mobile 16”
source: Reddit · u/Danmoreng · 2026-02-25
communityconfidence 70%
25.00tok/s — Qwen3-Coder on RTX 5080 via lm-studio
“Running Qwen3-Coder 30B at Full 256K Context: 25 tok/s with 96GB RAM + RTX 5080 Hello, I come to share with you my happiness running Qwen3-Coder 30B at its maximum unstretched context (256K).”
source: Reddit · u/ajmusic15 · 2025-08-02
communityconfidence 70%
25.00tok/s — Qwen3-Coder on RTX 5080 via lm-studio
“Running Qwen3-Coder 30B at Full 256K Context: 25 tok/s with 96GB RAM + RTX 5080 Hello, I come to share with you my happiness running Qwen3-Coder 30B at its maximum unstretched context (256K).”
source: Reddit · u/ajmusic15 · 2025-08-02
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06
communityconfidence 65%
47.00tok/s — Qwen3-Coder-Next on RTX 5080 IQ3_XXS
“del is the biggest one I can run on my 5080 Edit: actually just tried it and I could run Qwen3-Coder-Next UD-IQ3_XXS with 262k context at ~47 t/s which isn't bad!”
source: Reddit · u/grumd · 2026-03-06

See all 39 claims for RTX 5080 →

Common questions about RTX 5080

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the RTX 5080 FAQ →