L40S — LLM benchmarks
No benchmarks on L40S yet.
No L40S benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on L40S
4 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 60%
15.00tok/s — gpt-oss 120B on L40S via ollama
“.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…”
- communityconfidence 60%
4.10tok/s — deepseek-R1 on L40S via ollama
“ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…”
- communityconfidence 60%
15.00tok/s — gpt-oss 120B on L40S via ollama
“.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…”
- communityconfidence 60%
4.10tok/s — deepseek-R1 on L40S via ollama
“ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…”
Common questions about L40S
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.