Skip to content
llm-speed

L40S — LLM benchmarks

No benchmarks on L40S yet.

No L40S benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on L40S

4 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 60%

    15.00tok/s gpt-oss 120B on L40S via ollama

    .1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…

    source: Reddit · u/RaltarGOTSP · 2025-08-28

  • communityconfidence 60%

    4.10tok/s deepseek-R1 on L40S via ollama

    ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…

    source: Reddit · u/RaltarGOTSP · 2025-08-28

  • communityconfidence 60%

    15.00tok/s gpt-oss 120B on L40S via ollama

    .1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits 15t/s eval. (gpt-oss 120B doesn't run reliably on my single L40 and gets much slower when it does manage to run partially in VRAM on that s…

    source: Reddit · u/RaltarGOTSP · 2025-08-28

  • communityconfidence 60%

    4.10tok/s deepseek-R1 on L40S via ollama

    ompt eval using GPT-OSS 120B in ollama, with no special hackery. For comparison, the much smaller deepseek-R1 70B takes the same prompt at 4.1 t/s and 1173 t/s eval and prompt eval respectively on this system. Even on an L40 which can load it totally into VRAM, R1-70B only hits…

    source: Reddit · u/RaltarGOTSP · 2025-08-28

Common questions about L40S

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the L40S FAQ →