M2 Max — LLM benchmarks

Name: M2 Max — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M2 Max, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M2 Max yet.

No M2 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M2 Max

15 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 70%
27.00tok/s — Qwen2.5-coder-14B on M2 Max via mlx
“ith Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I need longer conversation. To me, 32B Q4 is the best ratio quality/speed we can get on our machines.”
source: Reddit · 2024-12-29
communityconfidence 70%
11.00tok/s — Qwen2.5-coder-32B on M2 Max via mlx
“I have the 32GB M2 Max and am currently working with Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I”
source: Reddit · 2024-12-29
communityconfidence 60%
50.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.”
source: Reddit · u/BumbleSlob · 2025-05-09
communityconfidence 60%
70.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.”
source: Reddit · u/BumbleSlob · 2025-05-09
communityconfidence 60%
38.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ”
source: Reddit · u/BumbleSlob · 2025-05-09
communityconfidence 60%
50.00tok/s — Qwen 3 30B A3B on M2 Max via mlx
“I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via O”
source: Reddit · u/BumbleSlob · 2025-05-09
communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
source: Reddit · u/Ill_Barber8709 · 2026-04-06
communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
source: Reddit · u/Ill_Barber8709 · 2026-04-06
communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
source: Reddit · u/Ill_Barber8709 · 2026-04-06
communityconfidence 55%
18.00tok/s — on M2 Max via mlx Q4_K_M
“That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than”
source: Reddit · u/Ill_Barber8709 · 2026-04-06

See all 15 claims for M2 Max →

Common questions about M2 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M2 Max FAQ →