M4 Pro — LLM benchmarks
No benchmarks on M4 Pro yet.
No M4 Pro benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M4 Pro
21 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 70%
70.00tok/s — Qwen3-Coder on M4 Pro via mlx
“urpose-built for code, MoE architecture so only 3B params active per token. Fits in 36GB with room for 16-32K context. On M4 Pro MLX I get ~70 tok/s with it. If you also want a general-purpose model to keep alongside it, **Qwen3.5-35B-A3B** is the same MoE architecture, similar …”
- communityconfidence 60%
7.24tok/s — qwen2.5 on M4 Pro via ollama
“2.5 right up until you hit the 72B parameter size. Due to VRAM it falls apart. The M4 max and M4 Pro could run the 72B, but at 8.8t/s and 7.24t/s. This is certainly better than running on a CPU (which is what happens with the 4090), but it's still too slow to be worth a $4k pu…”
- communityconfidence 60%
8.80tok/s — qwen2.5 on M4 Pro via ollama
“lly at qwen2.5 right up until you hit the 72B parameter size. Due to VRAM it falls apart. The M4 max and M4 Pro could run the 72B, but at 8.8t/s and 7.24t/s. This is certainly better than running on a CPU (which is what happens with the 4090), but it's still too slow to be wor…”
- communityconfidence 60%
72.00tok/s — GPT OSS 20B on M4 Pro via mlx
“edge frozen as of 2024-06, handled rescaling and converting a recipe into metric that Qwen 3 30B 3AB 2507 completely hosed, churns at about 72 tok/s on a MacBook M4 Pro Max 4 bit MLX. And it ALMOST got a side scrolling shooter working, whereas the 120B didn't, and Qwen 3 didn't …”
- communityconfidence 60%
14.50tok/s — Gemma 3 27b on M4 Pro via mlx
“Mac Mini M4 Pro (20c GPU) 64GB unified RAM; Gemma 3 27b with MLX 14.5t/s power usage almost 70W (including connected keyboard and mouse). So more efficient then even the Ryzen 395 AI (if those results are accurat”
- communityconfidence 60%
55.00tok/s — Qwen3 30b a3b on M4 Pro via mlx
“Binned M4 Pro/48GB owner here since November--current daily driver is Qwen3 30b a3b 8-bit MLX @ 55t/s ymmv, but I like it a lot and it flies.”
- communityconfidence 60%
11.37tok/s — Qwen3 30b-a3b on M4 Pro via mlx
“That's about three times as fast as my binned M4 Pro/48GB: with Qwen3 30b-a3b 8-bit MLX, I got 180 seconds to first token and 11.37t/s with the same size lorem ipsum prompt. That tracks really well with the 3X memory bandwidth difference.”
- communityconfidence 55%
60.50tok/s — on M4 Pro via lm-studio FP16
“32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …”
- communityconfidence 55%
60.50tok/s — on M4 Pro via lm-studio FP16
“32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …”
- communityconfidence 55%
60.50tok/s — on M4 Pro via lm-studio FP16
“32 GB DDR5|CUDA 12 llama.cpp (LM Studio)|59.1 tok/s|0.02 s|\-| |M4 Pro|16 GPU cores, MacBook Pro 14”, 48 GB unified memory|MLX (LM Studio)|60.5 tok/s 👑|0.31 s|3.69| # Super Interesting Notes: **1. The neural accelerators didn't make much of a difference. Here's why!** * First …”
Common questions about M4 Pro
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.