RTX 3090 — LLM benchmarks
No benchmarks on RTX 3090 yet.
No RTX 3090 benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on RTX 3090
401 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 85%
10.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/sec, which is perfectly usable. * 3xP40 fits 120K context at Q8 comfortably. * performance doesn't scale with more P40s. Using `-sm r…”
- communityconfidence 85%
28.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“ase of qwen-2.5-32B today. I bench marked the Q4 and Q8 quants on my local rig (3xP40, 1x3090). Some observations: * the 3090 is a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/se…”
- communityconfidence 85%
10.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/sec, which is perfectly usable. * 3xP40 fits 120K context at Q8 comfortably. * performance doesn't scale with more P40s. Using `-sm r…”
- communityconfidence 85%
28.00tok/s — qwen-2.5-32B on RTX 3090 via ollama Q8
“ase of qwen-2.5-32B today. I bench marked the Q4 and Q8 quants on my local rig (3xP40, 1x3090). Some observations: * the 3090 is a beast! 28 tok/sec at 32K context is more than usable for a lot of coding situations. * The P40s continue to surprise. A single P40 can do 10 tok/se…”
- communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
- communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
- communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
- communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
- communityconfidence 85%
65.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly, affecting this model especially, that c…”
- communityconfidence 85%
71.00tok/s — Qwen3-Coder-Next on RTX 3090 via vllm FP8
“sted. I'm running Unsloth's Qwen3-Coder-Next-FP8-Dynamic quant with vLLM v0.16.0 over 4 RTX 3090s with a 200k context window. I get around 71tps at low context, dropping to around 65tps at the 100k context point. Something to note, I'm seeing pretty bad issues with vLLM nightly…”
Common questions about RTX 3090
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.