M1 Max — LLM benchmarks

Name: M1 Max — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M1 Max, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M1 Max yet.

No M1 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M1 Max

13 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 75%
10.62tok/s — qwen2.5-32b on M1 Max via ollama q4_K_M
“is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s”
source: Reddit · u/xilvar · 2025-02-28
communityconfidence 75%
26.55tok/s — qwen2.5-32b on M1 Max via ollama q4_K_M
“f you tell me the prompt you’re using. ‘why is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s”
source: Reddit · u/xilvar · 2025-02-28
communityconfidence 60%
11.84tok/s — Qwen2.5-32B on M1 Max via mlx
“This is great information! You're getting roughly double the speed as my M1 Max 32-core GPU 64GB. I got 11.84 tokens/s on Qwen2.5-32B-Instruct (4bit) MLX”
source: Reddit · u/SubstantialSock8002 · 2025-02-28
communityconfidence 60%
16.00tok/s — Gemma3 27B on M1 Max via mlx
“My friend just got MacBook Pro M1 Max 64GB for $1200 used. **Gemma3 27B Q4** on MLX does 16tok/s on that. 800GB/s memory bandwidth. Maybe consider that?”
source: Reddit · u/AXYZE8 · 2025-07-08
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13
communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
source: Reddit · u/arthware · 2026-03-13

See all 13 claims for M1 Max →

Common questions about M1 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M1 Max FAQ →