M5 — LLM benchmarks
No benchmarks on M5 yet.
No M5 benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M5
12 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 60%
72.00tok/s — Qwen3-Coder-Next on M5 via mlx
“128GB RAM - Qwen3 Coder Next 8-Bit Benchmark # Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama **TLDR**: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends — **MLX** (Ap…”
- communityconfidence 55%
128.0tok/s — on M5 via llama.cpp FP16
“± 3.55| |llama 7B Q4\_0|3.56 GiB|6.74 B|Vulkan|100|1|tg128|22.31 ± 0.06| Intel: |Build|Hardware|Backend|FP16 TFLOPS|MBW GB/s|pp512 t/s|tg128 t/s|t/TFLOP|MBW %| |:-|:-|:-|:-|:-|:-|:-|:-|:-| |b4008|Arc 140V|IPEX-LLM|32.0|136.5|656.5|22.98|20.52|59.93| Admittedly. the Intel data …”
Common questions about M5
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.