M1 Max — LLM benchmarks
No benchmarks on M1 Max yet.
No M1 Max benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M1 Max
13 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 75%
10.62tok/s — qwen2.5-32b on M1 Max via ollama q4_K_M
“is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s”
- communityconfidence 75%
26.55tok/s — qwen2.5-32b on M1 Max via ollama q4_K_M
“f you tell me the prompt you’re using. ‘why is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s”
- communityconfidence 60%
11.84tok/s — Qwen2.5-32B on M1 Max via mlx
“This is great information! You're getting roughly double the speed as my M1 Max 32-core GPU 64GB. I got 11.84 tokens/s on Qwen2.5-32B-Instruct (4bit) MLX”
- communityconfidence 60%
16.00tok/s — Gemma3 27B on M1 Max via mlx
“My friend just got MacBook Pro M1 Max 64GB for $1200 used. **Gemma3 27B Q4** on MLX does 16tok/s on that. 800GB/s memory bandwidth. Maybe consider that?”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
- communityconfidence 50%
57.00tok/s — on M1 Max via lm-studio
“Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…”
Common questions about M1 Max
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.