M3 Pro (18-core GPU) — LLM benchmarks

11 workload results across 9 models.

Fastest known config on M3 Pro (18-core GPU)

286.5 decode tok/s

Qwen3-32B-4bit

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	7.16tok/s	25.65tok/s	4,288ms	r_pnrrpcdqfo4

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	7.05tok/s	33.41tok/s	3,921ms	r_f4x3xaan2ay

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	21.97tok/s	102.5tok/s	1,151ms	r_q7t7b7dcuz5

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	4bit	14.41tok/s	117.9tok/s	1,111ms	r_ia73dzeue0b

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	15.61tok/s	120.7tok/s	936ms	r_-w8hnn61va_

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	29.20tok/s	203.3tok/s	669ms	r_h0-use1ypnb

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	4bit	30.52tok/s	161.9tok/s	809ms	r_llzv_g-ymaf
chat-short	mlx@0.31.3	4bit	15.67tok/s	54.34tok/s	2,411ms	r_e3t93rscswq

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	19.37tok/s	131.3tok/s	967ms	r_pqjsvd-cub4

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	4bit	286.5tok/s	656.2tok/s	200ms	r_akcbpx5vcqa
chat-short	mlx@0.31.3	4bit	282.6tok/s	668.8tok/s	196ms	r_bftqtkilvoe

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.