Qwen3-Next-80B-A3B-Instruct-MLX-4bit
4 workload results across 1 hardware configuration.
Fastest local config
80.3 decode tok/s
on M3 Ultra (60-core GPU) + 96GB unified via mlx (4bit) — see full run
Local runs (4 runs)
Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.
M3 Ultra (60-core GPU) + 96GB unified
| Workload | Backend | Quant | decode tok/s | prefill tok/s | TTFT | Run |
|---|---|---|---|---|---|---|
| chat-short | mlx@0.31.3 | 4bit | 80.34tok/s | 24.49tok/s | 4,493ms | r_1pl79r50ofy |
| chat-long | mlx@0.31.3 | 4bit | 77.63tok/s | 1,608.6tok/s | 1,956ms | r_1pl79r50ofy |
| concurrent-decode | mlx@0.31.3 | 4bit | 78.60tok/s | no data | no data | r_1pl79r50ofy |
| agent-trace | mlx@0.31.3 | 4bit | 78.41tok/s | 1,586.3tok/s | 1,318ms | r_1pl79r50ofy |