Skip to content
llm-speed

M3 — LLM benchmarks

No benchmarks on M3 yet.

No M3 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M3

21 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 70%

    40.00tok/s Phi-4 Mini on M3 via llama.cpp

    t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…

    source: Reddit · u/zennaxxarion · 2025-10-08

  • communityconfidence 70%

    40.00tok/s Phi-4 Mini on M3 via llama.cpp

    t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…

    source: Reddit · u/zennaxxarion · 2025-10-08

  • communityconfidence 70%

    40.00tok/s Phi-4 Mini on M3 via llama.cpp

    t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…

    source: Reddit · u/zennaxxarion · 2025-10-08

  • communityconfidence 70%

    40.00tok/s Phi-4 Mini on M3 via llama.cpp

    t/s). Every rival model loses 70% speed beyond 32k **High intelligence per token ratio:** * Scored 0.31 combined intelligence index at \~40 t/s, above Gemma 3 4B (0.20) and Phi-4 Mini (0.22) * Qwen 3 4B ranks slightly higher in raw score (0.35) but runs 3x slower **Outpaces IB…

    source: Reddit · u/zennaxxarion · 2025-10-08

  • communityconfidence 60%

    86.00tok/s Qwen3-30B-A3B on M3 via mlx

    w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda8b7f4266

    source: Reddit · u/ifioravanti · 2025-08-23

  • communityconfidence 60%

    97.00tok/s Qwen3-30B-A3B on M3 via mlx

    results! pp512: 🥇M3 w/ MLX: 2,320 t/s 🥈 3090: 2,157 t/s 🥉 M3 w/ Metal: 1,614 t/s tg128: 🥇 3090: 136 t/s 🥈 M3 w/ MLX: 97 t/s 🥉 M3 w/ Metal: 86 t/s https://preview.redd.it/7f1bj2ag4rkf1.png?width=2522&format=png&auto=webp&s=44184256681e46b0bb2d8324c4d6abda…

    source: Reddit · u/ifioravanti · 2025-08-23

  • communityconfidence 55%

    15.00tok/s on M3 INT4

    nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…

    source: Reddit · u/Radiant-Act4707 · 2025-11-07

  • communityconfidence 55%

    15.00tok/s on M3 INT4

    nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…

    source: Reddit · u/Radiant-Act4707 · 2025-11-07

  • communityconfidence 55%

    15.00tok/s on M3 INT4

    nt," it specializes in deep reasoning, autonomous tool orchestration, and coding. After running it on my setup with two M3 Ultras at around 15 tokens per second, I can vouch for its efficiency and capabilities. The 256K context window handled large projects without hiccups, and i…

    source: Reddit · u/Radiant-Act4707 · 2025-11-07

  • communityconfidence 50%

    1.00tok/s Llama 3.2 3B on M3

    le:**  * Keeps producing \~40 tokens per second on Mac even past 32k context * Still cranks out \~33 t/s at 128k while Qwen 3 4B drops to <1 t/s and Llama 3.2 3B goes down to \~5 t/s **Best long context efficiency:** * From 1k to 128k context, latency barely moves (43 to 33 t/…

    source: Reddit · u/zennaxxarion · 2025-10-08

See all 21 claims for M3

Common questions about M3

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M3 FAQ →