Skip to content
llm-speed

M5 — LLM benchmarks

No benchmarks on M5 yet.

No M5 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M5

12 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 60%

    72.00tok/s Qwen3-Coder-Next on M5 via mlx

    128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…

    source: Reddit · u/paddybuc · 2026-03-29

  • communityconfidence 55%

    128.0tok/s on M5 via llama.cpp FP16

    ± 3.55| |llama 7B Q4\_0|3.56 GiB|6.74 B|Vulkan|100|1|tg128|22.31 ± 0.06| Intel: |Build|Hardware|Backend|FP16 TFLOPS|MBW GB/s|pp512 t/s|tg128 t/s|t/TFLOP|MBW %| |:-|:-|:-|:-|:-|:-|:-|:-|:-| |b4008|Arc 140V|IPEX-LLM|32.0|136.5|656.5|22.98|20.52|59.93| Admittedly. the Intel data …

    source: Reddit · u/Noble00_ · 2025-10-27

See all 12 claims for M5

Common questions about M5

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M5 FAQ →