Skip to content
llm-speed

M5 Max — LLM benchmarks

No benchmarks on M5 Max yet.

No M5 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M5 Max

94 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 70%

    24.79tok/s Gemma 3 27B on M5 Max via lm-studio

    W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761|19,761|Same…

    source: Reddit · u/M5_Maxxx · 2026-03-17

  • communityconfidence 70%

    23.16tok/s Gemma 3 27B on M5 Max via lm-studio

    |< 70W|< 115W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761…

    source: Reddit · u/M5_Maxxx · 2026-03-17

  • communityconfidence 70%

    24.79tok/s Gemma 3 27B on M5 Max via lm-studio

    W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761|19,761|Same…

    source: Reddit · u/M5_Maxxx · 2026-03-17

  • communityconfidence 70%

    23.16tok/s Gemma 3 27B on M5 Max via lm-studio

    |< 70W|< 115W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761…

    source: Reddit · u/M5_Maxxx · 2026-03-17

  • communityconfidence 60%

    31.60tok/s Qwen 2.5 72B on M5 Max via mlx

    oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…

    source: Reddit · u/affenhoden · 2026-03-21

  • communityconfidence 60%

    16.50tok/s Qwen 2.5 72B on M5 Max via llama.cpp

    onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…

    source: Reddit · u/affenhoden · 2026-03-21

  • communityconfidence 60%

    31.60tok/s Qwen 2.5 72B on M5 Max via mlx

    oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…

    source: Reddit · u/affenhoden · 2026-03-21

  • communityconfidence 60%

    16.50tok/s Qwen 2.5 72B on M5 Max via llama.cpp

    onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…

    source: Reddit · u/affenhoden · 2026-03-21

  • communityconfidence 60%

    31.60tok/s Qwen 2.5 72B on M5 Max via mlx

    oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…

    source: Reddit · u/affenhoden · 2026-03-21

  • communityconfidence 60%

    16.50tok/s Qwen 2.5 72B on M5 Max via llama.cpp

    onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…

    source: Reddit · u/affenhoden · 2026-03-21

See all 94 claims for M5 Max

Common questions about M5 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M5 Max FAQ →