Leaderboard/models/lmstudio-community-qwen3-next-80b-a3b-instruct-mlx-4bit

Qwen3-Next-80B-A3B-Instruct-MLX-4bit

4 workload results across 1 hardware configuration.

Fastest local config

80.3 decode tok/s

Local runs (4 runs)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M3 Ultra (60-core GPU) + 96GB unified

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	4bit	80.34tok/s	24.49tok/s	4,493ms	r_1pl79r50ofy
chat-long	mlx@0.31.3	4bit	77.63tok/s	1,608.6tok/s	1,956ms	r_1pl79r50ofy
concurrent-decode	mlx@0.31.3	4bit	78.60tok/s	no data	no data	r_1pl79r50ofy
agent-trace	mlx@0.31.3	4bit	78.41tok/s	1,586.3tok/s	1,318ms	r_1pl79r50ofy