Skip to content
llm-speed

M2 Max — LLM benchmarks

No benchmarks on M2 Max yet.

No M2 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M2 Max

15 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 70%

    27.00tok/s Qwen2.5-coder-14B on M2 Max via mlx

    ith Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I need longer conversation. To me, 32B Q4 is the best ratio quality/speed we can get on our machines.

    source: Reddit · 2024-12-29

  • communityconfidence 70%

    11.00tok/s Qwen2.5-coder-32B on M2 Max via mlx

    I have the 32GB M2 Max and am currently working with Qwen2.5-coder-32B Q4 MLX at 11t/s with a context size of 12K while using Xcode, Fork, Safari and Music. I use Qwen2.5-coder-14B MLX Q4 (27t/s) with full context size when I

    source: Reddit · 2024-12-29

  • communityconfidence 60%

    50.00tok/s Qwen 3 30B A3B on M2 Max via mlx

    I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.

    source: Reddit · u/BumbleSlob · 2025-05-09

  • communityconfidence 60%

    70.00tok/s Qwen 3 30B A3B on M2 Max via mlx

    I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ8.

    source: Reddit · u/BumbleSlob · 2025-05-09

  • communityconfidence 60%

    38.00tok/s Qwen 3 30B A3B on M2 Max via mlx

    I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via Open WebUI for MLXQ

    source: Reddit · u/BumbleSlob · 2025-05-09

  • communityconfidence 60%

    50.00tok/s Qwen 3 30B A3B on M2 Max via mlx

    I have an M2 Max. With Qwen 3 30B A3B: * GGUF Q4KM = 50Tps * GGUF Q8 = 38Tps * MLX Q4KM = 70Tps * MLX Q8 = 50 Tps I am currently experimenting with using LM Studio to run my backend and using via O

    source: Reddit · u/BumbleSlob · 2025-05-09

  • communityconfidence 55%

    18.00tok/s on M2 Max via mlx Q4_K_M

    That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than

    source: Reddit · u/Ill_Barber8709 · 2026-04-06

  • communityconfidence 55%

    18.00tok/s on M2 Max via mlx Q4_K_M

    That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than

    source: Reddit · u/Ill_Barber8709 · 2026-04-06

  • communityconfidence 55%

    18.00tok/s on M2 Max via mlx Q4_K_M

    That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than

    source: Reddit · u/Ill_Barber8709 · 2026-04-06

  • communityconfidence 55%

    18.00tok/s on M2 Max via mlx Q4_K_M

    That's a little bit surprising. I'm a daily user of Devstral-Small-2 24B 4Bit MLX on an M2 Max MBP (32GB 400GB/s) and I get 18t/s on average. I use it extensively as an agent in VSCode and Xcode, so no small tasks nor small context. I understand that MLX is faster than

    source: Reddit · u/Ill_Barber8709 · 2026-04-06

See all 15 claims for M2 Max

Common questions about M2 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M2 Max FAQ →