Skip to content
llm-speed
Leaderboard/models/gpt-oss-120b

gpt-oss-120b

No benchmarks for gpt-oss-120b yet.

No gpt-oss-120b benchmarks yet.

Run on your hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore

155 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    26.00tok/s GPT-OSS-120B on Strix Halo via llama.cpp Q6_K

    token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama.cpp with 8-bit context quantization for all models to fit these larger contexts in memory comfortably. Dense models run …

    source: Reddit · u/spaceman_ · 2025-12-15

  • communityconfidence 75%

    47.00tok/s GPT-OSS-120B on Strix Halo via llama.cpp Q6_K

    ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama.cpp with 8-bit context quantization for all models to fit the…

    source: Reddit · u/spaceman_ · 2025-12-15

  • communityconfidence 75%

    25.00tok/s GPT-OSS-120B on Strix Halo via llama.cpp IQ4_S

    ~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47 t/s - Qwen3-Next (80B) Q6_K with 262144 token context: ~26 t/s I use llama…

    source: Reddit · u/spaceman_ · 2025-12-15

  • communityconfidence 75%

    20.00tok/s GPT-OSS-120B on Strix Halo via llama.cpp Q5_K

    o laptop with 128GB: - GLM-4.5-Air (106B) MXFP4 with 131072 token context: ~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 t/s - GPT-OSS-120B (120B) MXFP4 with 131072 token context: ~47…

    source: Reddit · u/spaceman_ · 2025-12-15

  • communityconfidence 75%

    25.00tok/s GPT-OSS-120B on Strix Halo via llama.cpp Q5_K

    lly hard to fault. My go-to models and quants on my Strix Halo laptop with 128GB: - GLM-4.5-Air (106B) MXFP4 with 131072 token context: ~ 25 t/s - Intellect-3 (106B) Q5_K with 131072 token context: ~ 20 t/s - Minimax M2 (172B REAP version) IQ4_S with 150000 token context: ~ 25 …

    source: Reddit · u/spaceman_ · 2025-12-15

  • communityconfidence 75%

    30.00tok/s gpt-oss-120b on Ryzen AI MAX+ via llama.cpp FP4

    xcited to see the initial benchmarks rolling in for the AGX Thor following yesterday's release \[1\]. A recent YouTube video showed around 30 tokens/sec generation speed with gpt-oss-120b using llama.cpp \[2\]. Interestingly, users over in r/LocalLLaMA have reported similar perf…

    source: Reddit · u/Herald_Of_Rivia · 2025-08-26

  • communityconfidence 75%

    40.00tok/s gpt-oss-120B on M3 Max via llama.cpp Q2

    5-air Both are MoE models which improve speed and just about max out the amount of ram you'd have. On my 128GB M3 Max macbook, these are \~40 t/s which is fine (note that glm is slower and both are faster running with mlx instead of llama.cpp). I think the Ryzen is broadly simi…

    source: Reddit · u/RemarkableAd66 · 2025-11-20

  • communityconfidence 70%

    30.00tok/s GPT OSS 120B on Together AI via hosted-api

    ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…

    source: Reddit · u/vtkayaker · 2025-08-26

  • communityconfidence 70%

    30.00tok/s GPT OSS 120B on Together AI via hosted-api

    ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…

    source: Reddit · u/vtkayaker · 2025-08-26

  • communityconfidence 70%

    30.00tok/s GPT OSS 120B on Together AI via hosted-api

    ely predict 16 tokens of unchanged diff context correctly. But let's say I want to run a 4-bit quant of GLM 4.5 Air with 48-64k context at 30 tokens/second? What's the cheapest option? - An NVIDIA RTX PRO 6000 Blackwell 96GB costs around $8750. That would pay for _years_ of Cla…

    source: Reddit · u/vtkayaker · 2025-08-26

See all 155 claims for gpt-oss-120b