Skip to content
llm-speed
Leaderboard/hardware/rtx-5090-32gb

RTX 5090 (32GB) — LLM benchmarks

6 workload results across 1 model.

Fastest known config on RTX 5090 (32GB)

69.9 decode tok/s

Qwen3.6-27B-Q4_K_M.gguf via llama.cpp see full run

Qwen3.6-27B-Q4_K_M.gguf

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp67.28tok/sno data553msr_1pww-w7p8sd
chat-shortllama.cpp69.89tok/sno data3,995msr_bqsunbd6xa8
chat-shortllama.cpp47.75tok/sno data2,833msr_kj4fh_mmzj9
chat-shortllama.cpp45.92tok/sno data3,089msr_4u7250hj28o
chat-shortllama.cpp39.61tok/sno data227msr__b89kg2iica
chat-shortllama.cpp66.28tok/sno data353msr_79bwm4mq_4l

Models measured on RTX 5090 (32GB)

Common questions about RTX 5090 (32GB)

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the RTX 5090 (32GB) FAQ →