Best Mac for running local LLMs
M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.
The fastest submitted M-series result on the leaderboard is M3 Ultra (60-core GPU) at 168.3tok/s on mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit. Runner-up is M3 Pro (18-core GPU) at 30.52tok/s. Coverage spans 2 M-series chips. Decode speed scales roughly with memory bandwidth, so the Ultra tier tends to win against the Max tier of the same generation.
Fastest M-series result on the leaderboard: M3 Ultra (60-core GPU) at 168.3tok/s running mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit.
Apple Silicon is unusually good at LLM inference because the unified memory architecture sidesteps the VRAM ceiling that limits NVIDIA consumer GPUs. The trade-off is bandwidth: an M3 Pro tops out around 150 GB/s while an M3 Ultra does ~800 GB/s, and decode speed scales roughly with bandwidth. Below is every M-series result we have data for, ranked by best decode tok/s.
Submitted benchmarks
| Hardware | Best model | decode tok/s | Run |
|---|---|---|---|
| M3 Ultra (60-core GPU) | mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit | 168.3tok/s | r_l_v1-zq_qaz |
| M3 Pro (18-core GPU) | mlx-community-Qwen2.5-7B-Instruct-4bit | 30.52tok/s | r_llzv_g-ymaf |
Side-by-side comparisons
See also: All hardware · All models · Methodology