M4 Max — LLM benchmarks
No benchmarks on M4 Max yet.
No M4 Max benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M4 Max
72 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 75%
22.06tok/s — Phi4 on M4 Max via ollama fp16
“y **Results (All models downloaded from Ollama)** **gemma3:27b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|52.482042ms|22.06 tokens/s| |fp16|56.4445ms|6.99 tokens/s| **gemma3:12b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|56.818334ms|43.8…”
- communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
- communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
- communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
- communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
- communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
- communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
- communityconfidence 60%
101.9tok/s — Qwen2.5-7B on M4 Max via mlx
“GPU) | | --------------------------- | -------------------------------- | ------------------------------ | | Qwen2.5-7B-Instruct (4bit) | 101.87 tokens/s | 38.99 tokens/s | | Qwen2.5-14B-Instruct (4bit) | 52.22 tokens/s | 18.88 …”
- communityconfidence 60%
8.76tok/s — Qwen2.5 on M4 Max via lm-studio
“okens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio | MLX models | M4 Max (128 GB RAM, 40-…”
- communityconfidence 60%
6.95tok/s — Qwen2.5 on M4 Max via lm-studio
“:14B (4bit) | 38.23 tokens/s | 14.66 tokens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio …”
Common questions about M4 Max
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.