M4 — LLM benchmarks
No benchmarks on M4 yet.
No M4 benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M4
19 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…”
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
- communityconfidence 60%
180.0tok/s — Qwen3-32B on M4 via lm-studio
“9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …”
- communityconfidence 60%
35.64tok/s — qwen2.5-coder on M4 via ollama
“ATE: The issue was resolved by killing the processes that kept VRAM occupied, and then Ollama with the qwen2.5-coder:32b model performed at 35.64 tokens/s. I guess I won't need to spend money on M4 for now :)”
- communityconfidence 55%
39.00tok/s — on M4 via mlx fp16
“here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-community/gemma-3-text-...**…”
- communityconfidence 55%
66.00tok/s — on M4 via mlx fp16
“e -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-commu…”
- communityconfidence 55%
86.00tok/s — on M4 via mlx fp16
“ar, notice the -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** o…”
Common questions about M4
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.