Skip to content
llm-speed

M4 Max — LLM benchmarks

No benchmarks on M4 Max yet.

No M4 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M4 Max

72 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    22.06tok/s Phi4 on M4 Max via ollama fp16

    y **Results (All models downloaded from Ollama)** **gemma3:27b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|52.482042ms|22.06 tokens/s| |fp16|56.4445ms|6.99 tokens/s| **gemma3:12b** |Quantization|Load Duration|Inference Speed| |:-|:-|:-| |q4|56.818334ms|43.8…

    source: Reddit · u/purealgo · 2025-03-12

  • communityconfidence 70%

    25.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 70%

    17.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 70%

    25.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 70%

    17.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 70%

    25.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    ants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a surprisingly great result when placed next to ~25 tokens/s for the M4 Max. Edit: Especially when it could go even faster if only LM Studio had n-cpu-moe to put some of the load back onto …

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 70%

    17.00tok/s GPT-OSS 120B on M4 Max via lm-studio

    Can you share the prompt to generate the numbers in the graph? I'm getting ~15-17 token/s with 64k context GPT-OSS 120B MXFP4 (no KV quants) in LM Studio on an i7 13800H P1G6 with 128GB DDR5-5600 and an RTX 2000 Ada, which is a s

    source: Reddit · u/Dexamph · 2025-08-18

  • communityconfidence 60%

    101.9tok/s Qwen2.5-7B on M4 Max via mlx

    GPU) | | --------------------------- | -------------------------------- | ------------------------------ | | Qwen2.5-7B-Instruct (4bit) | 101.87 tokens/s | 38.99 tokens/s | | Qwen2.5-14B-Instruct (4bit) | 52.22 tokens/s | 18.88 …

    source: Reddit · u/purealgo · 2025-02-28

  • communityconfidence 60%

    8.76tok/s Qwen2.5 on M4 Max via lm-studio

    okens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio | MLX models | M4 Max (128 GB RAM, 40-…

    source: Reddit · u/purealgo · 2025-02-28

  • communityconfidence 60%

    6.95tok/s Qwen2.5 on M4 Max via lm-studio

    :14B (4bit) | 38.23 tokens/s | 14.66 tokens/s | | Qwen2.5:32B (4bit) | 19.35 tokens/s | 6.95 tokens/s | | Qwen2.5:72B (4bit) | 8.76 tokens/s | Didn't Test | #### LM Studio …

    source: Reddit · u/purealgo · 2025-02-28

See all 72 claims for M4 Max

Common questions about M4 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M4 Max FAQ →