Skip to content
llm-speed
Leaderboard/hardware/rtx-3090-ti

RTX 3090 Ti — LLM benchmarks

No benchmarks on RTX 3090 Ti yet.

No RTX 3090 Ti benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on RTX 3090 Ti

19 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    30.00tok/s Qwen-3 Coder on RTX 3090 Ti via ollama FP16

    on: **FP16** * Hardware: **RTX 5090 + RTX 3090 Ti** * Task: code generation **Results:** * **llama.cpp:** \~52 tokens/sec * **Ollama:** \~30 tokens/sec Both runs use the same model weights and hardware. The gap is \~70% in favor of llama.cpp. Has anyone dug into why this happ…

    source: Reddit · u/Shoddy_Bed3240 · 2026-01-07

  • communityconfidence 75%

    52.00tok/s Qwen-3 Coder on RTX 3090 Ti via llama.cpp FP16

    **Qwen-3 Coder 32B** * Precision: **FP16** * Hardware: **RTX 5090 + RTX 3090 Ti** * Task: code generation **Results:** * **llama.cpp:** \~52 tokens/sec * **Ollama:** \~30 tokens/sec Both runs use the same model weights and hardware. The gap is \~70% in favor of llama.cpp. Has…

    source: Reddit · u/Shoddy_Bed3240 · 2026-01-07

  • communityconfidence 75%

    55.00tok/s Mistral Nemo on RTX 3090 Ti via lm-studio FP16

    s on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which worked but was extremely slow due to some of the model going to…

    source: Reddit · u/Beastdrol · 2024-08-08

  • communityconfidence 75%

    21.50tok/s Mistral Nemo on RTX 3090 Ti via lm-studio FP16

    AI and LLM models, now I can completely offload latest models on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which wor…

    source: Reddit · u/Beastdrol · 2024-08-08

  • communityconfidence 75%

    55.00tok/s Mistral Nemo on RTX 3090 Ti via lm-studio FP16

    s on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which worked but was extremely slow due to some of the model going to…

    source: Reddit · u/Beastdrol · 2024-08-08

  • communityconfidence 75%

    21.50tok/s Mistral Nemo on RTX 3090 Ti via lm-studio FP16

    AI and LLM models, now I can completely offload latest models on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which wor…

    source: Reddit · u/Beastdrol · 2024-08-08

  • communityconfidence 60%

    46.00tok/s Llama 3 8B on RTX 3090 Ti via ollama

    del locally. I have since tried both mlc-llm as well as ollama (based on llama.cpp). mlc-llm is slightly faster (\~51 tok/s) vs ollama (\~46 tok/s) for running the 16 bit unquantized version of Llama 3 8B on my RTX 3090 Ti. However, mlc-llm uses about 2GB of VRAM more than the …

    source: Reddit · u/trajo123 · 2024-04-29

  • communityconfidence 60%

    51.00tok/s Llama 3 8B on RTX 3090 Ti via ollama

    s the way to run the model locally. I have since tried both mlc-llm as well as ollama (based on llama.cpp). mlc-llm is slightly faster (\~51 tok/s) vs ollama (\~46 tok/s) for running the 16 bit unquantized version of Llama 3 8B on my RTX 3090 Ti. However, mlc-llm uses about 2GB…

    source: Reddit · u/trajo123 · 2024-04-29

  • communityconfidence 60%

    500.0tok/s Llama 3 8B on RTX 3090 Ti via vllm

    10 mins I can go through 24.9M prompt tokens and 1.69M generated tokens on 2x rtx 3090 Ti and Llama 3 8B W8A8 in vLLM 0.8.3 V1, so about 41,500 t/s input and 2800 t/s output, both at the same time. I went through like 8B+ tokens on single card this way before I put in the second …

    source: Reddit · u/FullOf_Bad_Ideas · 2025-04-16

  • communityconfidence 60%

    500.0tok/s Llama 3 8B on RTX 3090 Ti via vllm

    10 mins I can go through 24.9M prompt tokens and 1.69M generated tokens on 2x rtx 3090 Ti and Llama 3 8B W8A8 in vLLM 0.8.3 V1, so about 41,500 t/s input and 2800 t/s output, both at the same time. I went through like 8B+ tokens on single card this way before I put in the second …

    source: Reddit · u/FullOf_Bad_Ideas · 2025-04-16

See all 19 claims for RTX 3090 Ti

Common questions about RTX 3090 Ti

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the RTX 3090 Ti FAQ →