gpt-oss-120b
No benchmarks for gpt-oss-120b yet.
No gpt-oss-120b benchmarks yet.
Run on your hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore
155 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 75%
26.00tok/s — GPT-OSS-120B on Strix Halo via llama.cpp Q6_K
“token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama.cpp with 8-bit context quantization for all models to fit these larger contexts in memory comfortably. Dense models run …”
- communityconfidence 75%
47.00tok/s — GPT-OSS-120B on Strix Halo via llama.cpp Q6_K
“~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama.cpp with 8-bit context quantization for all models to fit the…”
- communityconfidence 75%
25.00tok/s — GPT-OSS-120B on Strix Halo via llama.cpp IQ4_S
“~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama…”
- communityconfidence 75%
20.00tok/s — GPT-OSS-120B on Strix Halo via llama.cpp Q5_K
“o laptop with 128GB: - GLM-4.5-Air (106B) MXFP4 with 131072 token context: ~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47…”
- communityconfidence 75%
25.00tok/s — GPT-OSS-120B on Strix Halo via llama.cpp Q5_K
“lly hard to fault. My go-to models and quants on my Strix Halo laptop with 128GB: - GLM-4.5-Air (106B) MXFP4 with 131072 token context: ~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 …”
- communityconfidence 75%
30.00tok/s — gpt-oss-120b on Ryzen AI MAX+ via llama.cpp FP4
“xcited to see the initial benchmarks rolling in for the AGX Thor following yesterday's release \[1\]. A recent YouTube video showed around 30 tokens/sec generation speed with gpt-oss-120b using llama.cpp \[2\]. Interestingly, users over in r/LocalLLaMA have reported similar perf…”
- communityconfidence 75%
40.00tok/s — gpt-oss-120B on M3 Max via llama.cpp Q2
“5-air Both are MoE models which improve speed and just about max out the amount of ram you'd have. On my 128GB M3 Max macbook, these are \~40 t/s which is fine (note that glm is slower and both are faster running with mlx instead of llama.cpp). I think the Ryzen is broadly simi…”
- communityconfidence 70%
30.00tok/s — GPT OSS 120B on Together AI via hosted-api
“ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…”
- communityconfidence 70%
30.00tok/s — GPT OSS 120B on Together AI via hosted-api
“ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…”
- communityconfidence 70%
30.00tok/s — GPT OSS 120B on Together AI via hosted-api
“ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…”