Skip to content
llm-speed

M4 Pro — LLM benchmarks

No benchmarks on M4 Pro yet.

No M4 Pro benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M4 Pro

21 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 70%

    70.00tok/s Qwen3-Coder on M4 Pro via mlx

    urpose-built for code, MoE architecture so only 3B params active per token. Fits in 36GB with room for 16-32K context. On M4 Pro MLX I get ~70 tok/s with it. If you also want a general-purpose model to keep alongside it, **Qwen3.5-35B-A3B** is the same MoE architecture, similar …

    source: Reddit · u/the_real_druide67 · 2026-03-28

  • communityconfidence 60%

    7.24tok/s qwen2.5 on M4 Pro via ollama

    2.5 right up until you hit the 72B parameter size. Due to VRAM it falls apart. The M4 max and M4 Pro could run the 72B, but at 8.8t/s and 7.24t/s. This is certainly better than running on a CPU (which is what happens with the 4090), but it's still too slow to be worth a $4k pu…

    source: Reddit · u/darth_chewbacca · 2024-11-14

  • communityconfidence 60%

    8.80tok/s qwen2.5 on M4 Pro via ollama

    lly at qwen2.5 right up until you hit the 72B parameter size. Due to VRAM it falls apart. The M4 max and M4 Pro could run the 72B, but at 8.8t/s and 7.24t/s. This is certainly better than running on a CPU (which is what happens with the 4090), but it's still too slow to be wor…

    source: Reddit · u/darth_chewbacca · 2024-11-14

  • communityconfidence 60%

    72.00tok/s GPT OSS 20B on M4 Pro via mlx

    edge frozen as of 2024-06, handled rescaling and converting a recipe into metric that Qwen 3 30B 3AB 2507 completely hosed, churns at about 72 tok/s on a MacBook M4 Pro Max 4 bit MLX. And it ALMOST got a side scrolling shooter working, whereas the 120B didn't, and Qwen 3 didn't …

    source: Reddit · u/cspenn · 2025-08-05

  • communityconfidence 60%

    14.50tok/s Gemma 3 27b on M4 Pro via mlx

    Mac Mini M4 Pro (20c GPU) 64GB unified RAM; Gemma 3 27b with MLX 14.5t/s power usage almost 70W (including connected keyboard and mouse). So more efficient then even the Ryzen 395 AI (if those results are accurat

    source: Reddit · u/Cergorach · 2025-07-11

  • communityconfidence 60%

    55.00tok/s Qwen3 30b a3b on M4 Pro via mlx

    Binned M4 Pro/48GB owner here since November--current daily driver is Qwen3 30b a3b 8-bit MLX @ 55t/s ymmv, but I like it a lot and it flies.

    source: Reddit · u/MrPecunius · 2025-07-23

  • communityconfidence 60%

    11.37tok/s Qwen3 30b-a3b on M4 Pro via mlx

    That's about three times as fast as my binned M4 Pro/48GB: with Qwen3 30b-a3b 8-bit MLX, I got 180 seconds to first token and 11.37t/s with the same size lorem ipsum prompt. That tracks really well with the 3X memory bandwidth difference.

    source: Reddit · u/MrPecunius · 2025-05-26

  • communityconfidence 55%

    60.50tok/s on M4 Pro via lm-studio FP16

    32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …

    source: Reddit · u/TechExpert2910 · 2025-10-27

  • communityconfidence 55%

    60.50tok/s on M4 Pro via lm-studio FP16

    32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …

    source: Reddit · u/TechExpert2910 · 2025-10-27

  • communityconfidence 55%

    60.50tok/s on M4 Pro via lm-studio FP16

    32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …

    source: Reddit · u/TechExpert2910 · 2025-10-27

See all 21 claims for M4 Pro

Common questions about M4 Pro

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M4 Pro FAQ →