M3 — LLM benchmarks

Name: M3 — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M3, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M3 yet.

No M3 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M3

21 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
source: Reddit · u/zennaxxarion · 2025-10-08
communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
source: Reddit · u/zennaxxarion · 2025-10-08
communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
source: Reddit · u/zennaxxarion · 2025-10-08
communityconfidence 70%
40.00tok/s — Phi-4 Mini on M3 via llama.cpp
“t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…”
source: Reddit · u/zennaxxarion · 2025-10-08
communityconfidence 60%
86.00tok/s — Qwen3-30B-A3B on M3 via mlx
“w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda8b7f4266”
source: Reddit · u/ifioravanti · 2025-08-23
communityconfidence 60%
97.00tok/s — Qwen3-30B-A3B on M3 via mlx
“results! pp512: 🥇M3 w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda…”
source: Reddit · u/ifioravanti · 2025-08-23
communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
source: Reddit · u/Radiant-Act4707 · 2025-11-07
communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
source: Reddit · u/Radiant-Act4707 · 2025-11-07
communityconfidence 55%
15.00tok/s — on M3 INT4
“nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…”
source: Reddit · u/Radiant-Act4707 · 2025-11-07
communityconfidence 50%
1.00tok/s — Llama 3.2 3B on M3
“le:** * Keeps producing \~40 tokens per second on Mac even past 32k context * Still cranks out \~33 t/s at 128k while Qwen 3 4B drops to <1 t/s and Llama 3.2 3B goes down to \~5 t/s **Best long context efficiency:** * From 1k to 128k context, latency barely moves (43 to 33 t/…”
source: Reddit · u/zennaxxarion · 2025-10-08

See all 21 claims for M3 →

Common questions about M3

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M3 FAQ →