M1 Ultra — LLM benchmarks

Name: M1 Ultra — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M1 Ultra, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M1 Ultra yet.

No M1 Ultra benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M1 Ultra

2 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 60%
12.00tok/s — Qwen2.5-72B on M1 Ultra via lm-studio
“By comparison, my M1 Ultra does about 12 tokens/s for Qwen2.5-72B-Instruct (4bit). The extra bandwidth is just insanely good. BTW One other thing you can try is using speculative decoding”
source: Reddit · u/Spanky2k · 2025-02-28
communityconfidence 55%
12.70tok/s — on M1 Ultra via mlx fp16
“nd, vanilla JS frontend, WebSocket streaming **Some benchmarks on M1 Ultra (64GB):** |Model|Speed|Notes| |:-|:-|:-| |GooseOne 2.9B (fp16)|12.7 tok/s|Constant memory, no KV cache| |Z-Image Turbo (Q4)|77s / 1024×1024|Metal acceleration via mflux| The RNN advantage that made me b…”
source: Reddit · u/habachilles · 2026-03-07

Common questions about M1 Ultra

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M1 Ultra FAQ →