Qwen3.6-27B-Q4_K_M.gguf
6 workload results across 1 hardware configuration.
Fastest local config
69.9 decode tok/s
on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp — see full run
Local runs (6 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | llama.cpp | — | 67.28tok/s | no data | 553ms | r_1pww-w7p8sd |
| chat-short | llama.cpp | — | 69.89tok/s | no data | 3,995ms | r_bqsunbd6xa8 |
| chat-short | llama.cpp | — | 47.75tok/s | no data | 2,833ms | r_kj4fh_mmzj9 |
| chat-short | llama.cpp | — | 45.92tok/s | no data | 3,089ms | r_4u7250hj28o |
| chat-short | llama.cpp | — | 39.61tok/s | no data | 227ms | r__b89kg2iica |
| chat-short | llama.cpp | — | 66.28tok/s | no data | 353ms | r_79bwm4mq_4l |