Best Mac for running local LLMs

M-series chips trade off bandwidth, GPU cores, and unified memory ceiling. Here's the data, ranked by decode tok/s on Qwen-class models.

Verdict

The fastest submitted M-series result on the leaderboard is M3 Ultra (60-core GPU) at 168.3tok/s on mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit. Runner-up is M3 Pro (18-core GPU) at 30.52tok/s. Coverage spans 2 M-series chips. Decode speed scales roughly with memory bandwidth, so the Ultra tier tends to win against the Max tier of the same generation.

Recommendation

Fastest M-series result on the leaderboard: M3 Ultra (60-core GPU) at 168.3tok/s running mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit.

Apple Silicon is unusually good at LLM inference because the unified memory architecture sidesteps the VRAM ceiling that limits NVIDIA consumer GPUs. The trade-off is bandwidth: an M3 Pro tops out around 150 GB/s while an M3 Ultra does ~800 GB/s, and decode speed scales roughly with bandwidth. Below is every M-series result we have data for, ranked by best decode tok/s.

Submitted benchmarks

Hardware	Best model	decode tok/s	Run
M3 Ultra (60-core GPU)	mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit	168.3tok/s	r_l_v1-zq_qaz
M3 Pro (18-core GPU)	mlx-community-Qwen2.5-7B-Instruct-4bit	30.52tok/s	r_llzv_g-ymaf

Side-by-side comparisons

See also: All hardware · All models · Methodology