RTX 3090 — LLM benchmarks

Name: RTX 3090 — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: RTX 3090, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on RTX 3090 yet.

No RTX 3090 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on RTX 3090

401 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 85%
10.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/sec, which is perfectly usable. * 3xP40 fits 120K context at Q8 comfortably. * performance doesn't scale with more P40s. Using `-sm r…”
source: Reddit · u/No-Statement-0001 · 2024-11-11
communityconfidence 85%
28.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“ase of qwen-2.5-32B today. I bench marked the Q4 and Q8 quants on my local rig (3xP40, 1x3090). Some observations: * the 3090 is a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/se…”
source: Reddit · u/No-Statement-0001 · 2024-11-11
communityconfidence 85%
10.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/sec, which is perfectly usable. * 3xP40 fits 120K context at Q8 comfortably. * performance doesn't scale with more P40s. Using `-sm r…”
source: Reddit · u/No-Statement-0001 · 2024-11-11
communityconfidence 85%
28.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“ase of qwen-2.5-32B today. I bench marked the Q4 and Q8 quants on my local rig (3xP40, 1x3090). Some observations: * the 3090 is a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/se…”
source: Reddit · u/No-Statement-0001 · 2024-11-11
communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
source: Reddit · u/rmhubbert · 2026-03-04
communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
source: Reddit · u/rmhubbert · 2026-03-04
communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
source: Reddit · u/rmhubbert · 2026-03-04
communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
source: Reddit · u/rmhubbert · 2026-03-04
communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
source: Reddit · u/rmhubbert · 2026-03-04
communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
source: Reddit · u/rmhubbert · 2026-03-04

See all 401 claims for RTX 3090 →

Common questions about RTX 3090

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the RTX 3090 FAQ →