M2 Max — LLM benchmarks
No benchmarks on M2 Max yet.
No M2 Max benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M2 Max
15 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 70%
27.00tok/s — Qwen2.5-coder-14B on M2 Max via mlx
“ith Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I need longer conversation. To me, 32B Q4 is the best ratio quality/speed we can get on our machines.”
- communityconfidence 70%
11.00tok/s — Qwen2.5-coder-32B on M2 Max via mlx
“I have the 32GB M2 Max and am currently working with Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I”
- communityconfidence 60%
50.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.”
- communityconfidence 60%
70.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.”
- communityconfidence 60%
38.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ”
- communityconfidence 60%
50.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via O”
- communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
- communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
- communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
- communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
Common questions about M2 Max
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.