M4 — LLM benchmarks

Name: M4 — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M4, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M4 yet.

No M4 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M4

19 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
source: Reddit · u/ResearchCrafty1804 · 2025-05-07
communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
source: Reddit · u/ResearchCrafty1804 · 2025-05-07
communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
source: Reddit · u/ResearchCrafty1804 · 2025-05-07
communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
source: Reddit · u/WolframRavenwolf · 2025-05-07
communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
source: Reddit · u/WolframRavenwolf · 2025-05-07
communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
source: Reddit · u/WolframRavenwolf · 2025-05-07
communityconfidence 60%
35.64tok/s — qwen2.5-coder on M4 via ollama
“ATE: The issue was resolved by killing the processes that kept VRAM occupied, and then Ollama with the qwen2.5-coder:32b model performed at 35.64 tokens/s. I guess I won't need to spend money on M4 for now :)”
source: Reddit · u/Leonid-S · 2025-01-07
communityconfidence 55%
39.00tok/s — on M4 via mlx fp16
“here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-community/gemma-3-text-...**…”
source: Reddit · u/BaysQuorv · 2025-03-17
communityconfidence 55%
66.00tok/s — on M4 via mlx fp16
“e -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-commu…”
source: Reddit · u/BaysQuorv · 2025-03-17
communityconfidence 55%
86.00tok/s — on M4 via mlx fp16
“ar, notice the -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** o…”
source: Reddit · u/BaysQuorv · 2025-03-17

See all 19 claims for M4 →

Common questions about M4

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M4 FAQ →