M4 Max — LLM benchmarks

Name: M4 Max — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M4 Max, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M4 Max yet.

No M4 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M4 Max

72 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 75%
22.06tok/s — Phi4 on M4 Max via ollama fp16
“y **Results (All models downloaded from Ollama)** **gemma3:27b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|52.482042ms|22.06 tokens/s| |fp16|56.4445ms|6.99 tokens/s| **gemma3:12b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|56.818334ms|43.8…”
source: Reddit · u/purealgo · 2025-03-12
communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 70%
25.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 70%
17.00tok/s — GPT-OSS 120B on M4 Max via lm-studio
“Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s”
source: Reddit · u/Dexamph · 2025-08-18
communityconfidence 60%
101.9tok/s — Qwen2.5-7B on M4 Max via mlx
“GPU) | | --------------------------- | -------------------------------- | ------------------------------ | | Qwen2.5-7B-Instruct (4bit) | 101.87 tokens/s | 38.99 tokens/s | | Qwen2.5-14B-Instruct (4bit) | 52.22 tokens/s | 18.88 …”
source: Reddit · u/purealgo · 2025-02-28
communityconfidence 60%
8.76tok/s — Qwen2.5 on M4 Max via lm-studio
“okens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio | MLX models | M4 Max (128 GB RAM, 40-…”
source: Reddit · u/purealgo · 2025-02-28
communityconfidence 60%
6.95tok/s — Qwen2.5 on M4 Max via lm-studio
“:14B (4bit) | 38.23 tokens/s | 14.66 tokens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio …”
source: Reddit · u/purealgo · 2025-02-28

See all 72 claims for M4 Max →

Common questions about M4 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M4 Max FAQ →