Skip to content
llm-speed

M1 Max — LLM benchmarks

No benchmarks on M1 Max yet.

No M1 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M1 Max

13 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    10.62tok/s qwen2.5-32b on M1 Max via ollama q4_K_M

    is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s

    source: Reddit · u/xilvar · 2025-02-28

  • communityconfidence 75%

    26.55tok/s qwen2.5-32b on M1 Max via ollama q4_K_M

    f you tell me the prompt you’re using. ‘why is the sky blue?’ runs like this (ollama - m1 max 2e/8p/32 gpu): qwen2.5-14b-instruct-q4_K_M - 26.55 tokens/s qwen2.5-32b-instruct-q4_K_M - 10.62 tokens/s

    source: Reddit · u/xilvar · 2025-02-28

  • communityconfidence 60%

    11.84tok/s Qwen2.5-32B on M1 Max via mlx

    This is great information! You're getting roughly double the speed as my M1 Max 32-core GPU 64GB. I got 11.84 tokens/s on Qwen2.5-32B-Instruct (4bit) MLX

    source: Reddit · u/SubstantialSock8002 · 2025-02-28

  • communityconfidence 60%

    16.00tok/s Gemma3 27B on M1 Max via mlx

    My friend just got MacBook Pro M1 Max 64GB for $1200 used. **Gemma3 27B Q4** on MLX does 16tok/s on that. 800GB/s memory bandwidth. Maybe consider that?

    source: Reddit · u/AXYZE8 · 2025-07-08

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

  • communityconfidence 50%

    57.00tok/s on M1 Max via lm-studio

    Max (64GB, 24 GPU)|LM Studio|**17.0** (56.6)|**13.4** (56.8)|**5.9** (54.4)|**38.3** (58.9)| Generation speed is virtually identical (\~54-57 tok/s both). The difference is entirely in prefill: oMLX is up to **10x faster** on long contexts. At 8K context (prefill-test turn 4), L…

    source: Reddit · u/arthware · 2026-03-13

See all 13 claims for M1 Max

Common questions about M1 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M1 Max FAQ →