RTX 5090 (32GB) — LLM benchmarks
6 workload results across 1 model.
Fastest known config on RTX 5090 (32GB)
69.9 decode tok/s
Qwen3.6-27B-Q4_K_M.gguf via llama.cpp — see full run
Qwen3.6-27B-Q4_K_M.gguf
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | — | 67.28tok/s | no data | 553ms | r_1pww-w7p8sd |
| chat-short | llama.cpp | — | 69.89tok/s | no data | 3,995ms | r_bqsunbd6xa8 |
| chat-short | llama.cpp | — | 47.75tok/s | no data | 2,833ms | r_kj4fh_mmzj9 |
| chat-short | llama.cpp | — | 45.92tok/s | no data | 3,089ms | r_4u7250hj28o |
| chat-short | llama.cpp | — | 39.61tok/s | no data | 227ms | r__b89kg2iica |
| chat-short | llama.cpp | — | 66.28tok/s | no data | 353ms | r_79bwm4mq_4l |
Models measured on RTX 5090 (32GB)
Common questions about RTX 5090 (32GB)
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.