Skip to content
llm-speed
Leaderboard/models/qwen3-6-27b-q4-k-m-gguf

Qwen3.6-27B-Q4_K_M.gguf

6 workload results across 1 hardware configuration.

Fastest local config

69.9 decode tok/s

on RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB via llama.cpp see full run

Local runs (6 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GBRTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortllama.cpp67.28tok/sno data553msr_1pww-w7p8sd
chat-shortllama.cpp69.89tok/sno data3,995msr_bqsunbd6xa8
chat-shortllama.cpp47.75tok/sno data2,833msr_kj4fh_mmzj9
chat-shortllama.cpp45.92tok/sno data3,089msr_4u7250hj28o
chat-shortllama.cpp39.61tok/sno data227msr__b89kg2iica
chat-shortllama.cpp66.28tok/sno data353msr_79bwm4mq_4l

Qwen3.6-27B-Q4_K_M.gguf on hardware