M5 Max — LLM benchmarks

Name: M5 Max — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M5 Max, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M5 Max yet.

No M5 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M5 Max

94 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 70%
24.79tok/s — Gemma 3 27B on M5 Max via lm-studio
“W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761|19,761|Same…”
source: Reddit · u/M5_Maxxx · 2026-03-17
communityconfidence 70%
23.16tok/s — Gemma 3 27B on M5 Max via lm-studio
“|< 70W|< 115W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761…”
source: Reddit · u/M5_Maxxx · 2026-03-17
communityconfidence 70%
24.79tok/s — Gemma 3 27B on M5 Max via lm-studio
“W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761|19,761|Same…”
source: Reddit · u/M5_Maxxx · 2026-03-17
communityconfidence 70%
23.16tok/s — Gemma 3 27B on M5 Max via lm-studio
“|< 70W|< 115W|**+45W** (Thermal throttling risk)| |**Time to First Token (Prefill)**|89.83s|24.35s|**\~3.7x Faster**| |**Generation Speed**|23.16 tok/s|24.79 tok/s|**+1.63 tok/s** (Marginal)| |**Total Time**|847.87s|787.85s|**\~1 minute faster** overall| |**Prompt Tokens**|19,761…”
source: Reddit · u/M5_Maxxx · 2026-03-17
communityconfidence 60%
31.60tok/s — Qwen 2.5 72B on M5 Max via mlx
“oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…”
source: Reddit · u/affenhoden · 2026-03-21
communityconfidence 60%
16.50tok/s — Qwen 2.5 72B on M5 Max via llama.cpp
“onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…”
source: Reddit · u/affenhoden · 2026-03-21
communityconfidence 60%
31.60tok/s — Qwen 2.5 72B on M5 Max via mlx
“oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…”
source: Reddit · u/affenhoden · 2026-03-21
communityconfidence 60%
16.50tok/s — Qwen 2.5 72B on M5 Max via llama.cpp
“onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…”
source: Reddit · u/affenhoden · 2026-03-21
communityconfidence 60%
31.60tok/s — Qwen 2.5 72B on M5 Max via mlx
“oretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community reports that llama.cpp has a known pe…”
source: Reddit · u/affenhoden · 2026-03-21
communityconfidence 60%
16.50tok/s — Qwen 2.5 72B on M5 Max via llama.cpp
“onsistently achieves \~73-75% of theoretical maximum bandwidth utilization. # 2. MLX is Dramatically Faster for Qwen 3.5 * **llama.cpp**: 16.5 tok/s (Q6\_K, 21GB) * **MLX**: 31.6 tok/s (4bit, 16GB) * **Delta**: MLX is **92% faster** (1.9x speedup) This confirms the community r…”
source: Reddit · u/affenhoden · 2026-03-21

See all 94 claims for M5 Max →

Common questions about M5 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M5 Max FAQ →