Skip to content
llm-speed

M1 Ultra — LLM benchmarks

No benchmarks on M1 Ultra yet.

No M1 Ultra benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M1 Ultra

2 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 60%

    12.00tok/s Qwen2.5-72B on M1 Ultra via lm-studio

    By comparison, my M1 Ultra does about 12 tokens/s for Qwen2.5-72B-Instruct (4bit). The extra bandwidth is just insanely good. BTW One other thing you can try is using speculative decoding

    source: Reddit · u/Spanky2k · 2025-02-28

  • communityconfidence 55%

    12.70tok/s on M1 Ultra via mlx fp16

    nd, vanilla JS frontend, WebSocket streaming **Some benchmarks on M1 Ultra (64GB):** |Model|Speed|Notes| |:-|:-|:-| |GooseOne 2.9B (fp16)|12.7 tok/s|Constant memory, no KV cache| |Z-Image Turbo (Q4)|77s / 1024×1024|Metal acceleration via mflux| The RNN advantage that made me b…

    source: Reddit · u/habachilles · 2026-03-07

Common questions about M1 Ultra

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M1 Ultra FAQ →