gpt-oss-20b-MXFP4-Q4

1 workload result across 1 hardware configuration.

Fastest local config

152.7 decode tok/s

on M3 Ultra (60-core GPU) + 96GB unified via mlx — see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M3 Ultra (60-core GPU) + 96GB unified

Workload	Backend	Quant	decode tok/s	prefill tok/s	TTFT	Run
chat-short	mlx@0.31.3	—	152.7tok/s	239.9tok/s	692ms	r_3ijun8ltjnb

Community folklore

52 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 60%
221.0tok/s — GPT-OSS-20B on RTX 5090 via lm-studio
“GPT-OSS-20B on RTX 5090 – 221 tok/s in LM Studio (default settings + FlashAttention) Just tested **GPT-OSS-20B** locally using **LM Studio v0.3.21-b4** on my machine with an”
source: Reddit · u/Spiritual_Tie_5574 · 2025-08-05
communityconfidence 60%
72.00tok/s — GPT OSS 20B on M4 Pro via mlx
“edge frozen as of 2024-06, handled rescaling and converting a recipe into metric that Qwen 3 30B 3AB 2507 completely hosed, churns at about 72 tok/s on a MacBook M4 Pro Max 4 bit MLX. And it ALMOST got a side scrolling shooter working, whereas the 120B didn't, and Qwen 3 didn't …”
source: Reddit · u/cspenn · 2025-08-05
communityconfidence 60%
103.0tok/s — gpt-oss-20b on M4 Max via mlx
“/gpt-oss-20b-GGUF]() (thanks [unsloth]() 🙌) The qualities are roughly the same for our use case, but performance: On an M4 Max, MLX hit \~103 tok/s, about 25% faster than GGUF. # Quick tip You can paste any Hugging Face repo name into the CLI and pull it directly: nexa …”
source: Reddit · u/AlanzhuLy · 2025-08-09
communityconfidence 60%
103.0tok/s — gpt-oss-20b on M4 Max via mlx
“/gpt-oss-20b-GGUF]() (thanks [unsloth]() 🙌) The qualities are roughly the same for our use case, but performance: On an M4 Max, MLX hit \~103 tok/s, about 25% faster than GGUF. # Quick tip You can paste any Hugging Face repo name into the CLI and pull it directly: nexa …”
source: Reddit · u/AlanzhuLy · 2025-08-09
communityconfidence 60%
180.0tok/s — gpt oss 20b on Radeon RX 9070 XT via llama.cpp
“hanics) gpt OSs 20v won by a metric mile. It did run faster than gpt OSs though on llama.cpp (9070xt) with gpt OSs at 140 tok/s and lfm at 180 tok/s”
source: Reddit · u/ConversationOver9445 · 2026-02-25
communityconfidence 60%
140.0tok/s — gpt oss 20b on Radeon RX 9070 XT via llama.cpp
“cs and structural mechanics) gpt OSs 20v won by a metric mile. It did run faster than gpt OSs though on llama.cpp (9070xt) with gpt OSs at 140 tok/s and lfm at 180 tok/s”
source: Reddit · u/ConversationOver9445 · 2026-02-25
communityconfidence 60%
180.0tok/s — gpt oss 20b on Radeon RX 9070 XT via llama.cpp
“hanics) gpt OSs 20v won by a metric mile. It did run faster than gpt OSs though on llama.cpp (9070xt) with gpt OSs at 140 tok/s and lfm at 180 tok/s”
source: Reddit · u/ConversationOver9445 · 2026-02-25
communityconfidence 60%
140.0tok/s — gpt oss 20b on Radeon RX 9070 XT via llama.cpp
“cs and structural mechanics) gpt OSs 20v won by a metric mile. It did run faster than gpt OSs though on llama.cpp (9070xt) with gpt OSs at 140 tok/s and lfm at 180 tok/s”
source: Reddit · u/ConversationOver9445 · 2026-02-25
communityconfidence 60%
120.0tok/s — gpt-oss-20b on RTX 3090 via llama.cpp
“i haven't run 120b, but ran gpt-oss-20b q4/q8 on a 3090 nd saw ~70–120 tokens/sec depending on threads, context length and quant. 395 will fit bigger layouts but limited memory bandwidth often cuts sustained tps, so repro”
source: Reddit · u/SimpleMundane5291 · 2025-09-07
communityconfidence 60%
120.0tok/s — gpt-oss-20b on RTX 3090 via llama.cpp
“i haven't run 120b, but ran gpt-oss-20b q4/q8 on a 3090 nd saw ~70–120 tokens/sec depending on threads, context length and quant. 395 will fit bigger layouts but limited memory bandwidth often cuts sustained tps, so repro”
source: Reddit · u/SimpleMundane5291 · 2025-09-07

See all 52 claims for gpt-oss-20b-MXFP4-Q4 →

gpt-oss-20b-MXFP4-Q4 on hardware

M3 Ultra (60-core GPU) LLM benchmarks

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified

Community folklore

gpt-oss-20b-MXFP4-Q4 on hardware

M3 Ultra (60-core GPU) + 96GB unified