M5 — LLM benchmarks

Name: M5 — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M5, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M5 yet.

No M5 benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M5

12 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
source: Reddit · u/paddybuc · 2026-03-29
communityconfidence 55%
128.0tok/s — on M5 via llama.cpp FP16
“± 3.55| |llama 7B Q4\_0|3.56 GiB|6.74 B|Vulkan|100|1|tg128|22.31 ± 0.06| Intel: |Build|Hardware|Backend|FP16 TFLOPS|MBW GB/s|pp512 t/s|tg128 t/s|t/TFLOP|MBW %| |:-|:-|:-|:-|:-|:-|:-|:-|:-| |b4008|Arc 140V|IPEX-LLM|32.0|136.5|656.5|22.98|20.52|59.93| Admittedly. the Intel data …”
source: Reddit · u/Noble00_ · 2025-10-27

See all 12 claims for M5 →

Common questions about M5

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M5 FAQ →