M3 — LLM benchmarks
No benchmarks on M3 yet.
No M3 benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M3
21 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
- communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
- communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
- communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
- communityconfidence 60%
86.00tok/s — Qwen3-30B-A3B on M3 via mlx
“w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda8b7f4266”
- communityconfidence 60%
97.00tok/s — Qwen3-30B-A3B on M3 via mlx
“results! pp512: 🥇M3 w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda…”
- communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
- communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
- communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
- communityconfidence 50%
1.00tok/s — Llama 3.2 3B on M3
“le:** * Keeps producing \~40 tokens per second on Mac even past 32k context * Still cranks out \~33 t/s at 128k while Qwen 3 4B drops to <1 t/s and Llama 3.2 3B goes down to \~5 t/s **Best long context efficiency:** * From 1k to 128k context, latency barely moves (43 to 33 t/…”
Common questions about M3
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.