L40S — LLM benchmarks

Name: L40S — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: L40S, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on L40S yet.

No L40S benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on L40S

4 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 60%
15.00tok/s — gpt-oss 120B on L40S via ollama
“.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…”
source: Reddit · u/RaltarGOTSP · 2025-08-28
communityconfidence 60%
4.10tok/s — deepseek-R1 on L40S via ollama
“ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…”
source: Reddit · u/RaltarGOTSP · 2025-08-28
communityconfidence 60%
15.00tok/s — gpt-oss 120B on L40S via ollama
“.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…”
source: Reddit · u/RaltarGOTSP · 2025-08-28
communityconfidence 60%
4.10tok/s — deepseek-R1 on L40S via ollama
“ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…”
source: Reddit · u/RaltarGOTSP · 2025-08-28

Common questions about L40S

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the L40S FAQ →