Skip to content
llm-speed

M4 — LLM benchmarks

No benchmarks on M4 yet.

No M4 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M4

19 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…

    source: Reddit · u/ResearchCrafty1804 · 2025-05-07

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…

    source: Reddit · u/ResearchCrafty1804 · 2025-05-07

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    its 79.51% while sustaining ~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5️⃣ The 0.6B micro-model races above 180 tok/s but tops out at 37.56% - that's why it's not even on the graph (50 % performance cut-off). All local runs were done with @lmstudi…

    source: Reddit · u/ResearchCrafty1804 · 2025-05-07

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …

    source: Reddit · u/WolframRavenwolf · 2025-05-07

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …

    source: Reddit · u/WolframRavenwolf · 2025-05-07

  • communityconfidence 60%

    180.0tok/s Qwen3-32B on M4 via lm-studio

    9.51%** while sustaining \~64 tok/s - arguably today's best speed/quality trade-off for Mac setups. 5. The **0.6B** micro-model races above 180 tok/s but tops out at **37.56%** \- that's why it's not even on the graph (50 % performance cut-off). All local runs were done with LM …

    source: Reddit · u/WolframRavenwolf · 2025-05-07

  • communityconfidence 60%

    35.64tok/s qwen2.5-coder on M4 via ollama

    ATE: The issue was resolved by killing the processes that kept VRAM occupied, and then Ollama with the qwen2.5-coder:32b model performed at 35.64 tokens/s. I guess I won't need to spend money on M4 for now :)

    source: Reddit · u/Leonid-S · 2025-01-07

  • communityconfidence 55%

    39.00tok/s on M4 via mlx fp16

    here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-community/gemma-3-text-...**…

    source: Reddit · u/BaysQuorv · 2025-03-17

  • communityconfidence 55%

    66.00tok/s on M4 via mlx fp16

    e -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** or **mlx-commu…

    source: Reddit · u/BaysQuorv · 2025-03-17

  • communityconfidence 55%

    86.00tok/s on M4 via mlx fp16

    ar, notice the -text. Just for fun here are some benchmarks for gemma-3-text-1B-it-mlx on a base m4 mbp: q3 - 125 tps q4 - 110 tps q6 - 86 tps q8 - 66 tps fp16 I think - 39 tps Edit: to be clear the models that now are working are called **alexgusevski/gemma-3-text-...** o…

    source: Reddit · u/BaysQuorv · 2025-03-17

See all 19 claims for M4

Common questions about M4

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M4 FAQ →