M1 Ultra — LLM benchmarks
No benchmarks on M1 Ultra yet.
No M1 Ultra benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M1 Ultra
2 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 60%
12.00tok/s — Qwen2.5-72B on M1 Ultra via lm-studio
“By comparison, my M1 Ultra does about 12 tokens/s for Qwen2.5-72B-Instruct (4bit). The extra bandwidth is just insanely good. BTW One other thing you can try is using speculative decoding”
- communityconfidence 55%
12.70tok/s — on M1 Ultra via mlx fp16
“nd, vanilla JS frontend, WebSocket streaming **Some benchmarks on M1 Ultra (64GB):** |Model|Speed|Notes| |:-|:-|:-| |GooseOne 2.9B (fp16)|12.7 tok/s|Constant memory, no KV cache| |Z-Image Turbo (Q4)|77s / 1024×1024|Metal acceleration via mflux| The RNN advantage that made me b…”
Common questions about M1 Ultra
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.