RTX 3090 Ti — LLM benchmarks

Name: RTX 3090 Ti — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: RTX 3090 Ti, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on RTX 3090 Ti yet.

No RTX 3090 Ti benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on RTX 3090 Ti

19 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 75%
30.00tok/s — Qwen-3 Coder on RTX 3090 Ti via ollama FP16
“on: **FP16** * Hardware: **RTX 5090 + RTX 3090 Ti** * Task: code generation **Results:** * **llama.cpp:** \~52 tokens/sec * **Ollama:** \~30 tokens/sec Both runs use the same model weights and hardware. The gap is \~70% in favor of llama.cpp. Has anyone dug into why this happ…”
source: Reddit · u/Shoddy_Bed3240 · 2026-01-07
communityconfidence 75%
52.00tok/s — Qwen-3 Coder on RTX 3090 Ti via llama.cpp FP16
“**Qwen-3 Coder 32B** * Precision: **FP16** * Hardware: **RTX 5090 + RTX 3090 Ti** * Task: code generation **Results:** * **llama.cpp:** \~52 tokens/sec * **Ollama:** \~30 tokens/sec Both runs use the same model weights and hardware. The gap is \~70% in favor of llama.cpp. Has…”
source: Reddit · u/Shoddy_Bed3240 · 2026-01-07
communityconfidence 75%
55.00tok/s — Mistral Nemo on RTX 3090 Ti via lm-studio FP16
“s on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which worked but was extremely slow due to some of the model going to…”
source: Reddit · u/Beastdrol · 2024-08-08
communityconfidence 75%
21.50tok/s — Mistral Nemo on RTX 3090 Ti via lm-studio FP16
“AI and LLM models, now I can completely offload latest models on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which wor…”
source: Reddit · u/Beastdrol · 2024-08-08
communityconfidence 75%
55.00tok/s — Mistral Nemo on RTX 3090 Ti via lm-studio FP16
“s on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which worked but was extremely slow due to some of the model going to…”
source: Reddit · u/Beastdrol · 2024-08-08
communityconfidence 75%
21.50tok/s — Mistral Nemo on RTX 3090 Ti via lm-studio FP16
“AI and LLM models, now I can completely offload latest models on both of the GPUs, so for example Mistral Nemo 13B instruct in FP16 and get 21.5 tokens/sec. Llama 3.1 Instruct Q8 15Gb results in 51 - 55 tokens per second. I also tried running Llama 3.1 Q5 KS, 49GB model which wor…”
source: Reddit · u/Beastdrol · 2024-08-08
communityconfidence 60%
46.00tok/s — Llama 3 8B on RTX 3090 Ti via ollama
“del locally. I have since tried both mlc-llm as well as ollama (based on llama.cpp). mlc-llm is slightly faster (\~51 tok/s) vs ollama (\~46 tok/s) for running the 16 bit unquantized version of Llama 3 8B on my RTX 3090 Ti. However, mlc-llm uses about 2GB of VRAM more than the …”
source: Reddit · u/trajo123 · 2024-04-29
communityconfidence 60%
51.00tok/s — Llama 3 8B on RTX 3090 Ti via ollama
“s the way to run the model locally. I have since tried both mlc-llm as well as ollama (based on llama.cpp). mlc-llm is slightly faster (\~51 tok/s) vs ollama (\~46 tok/s) for running the 16 bit unquantized version of Llama 3 8B on my RTX 3090 Ti. However, mlc-llm uses about 2GB…”
source: Reddit · u/trajo123 · 2024-04-29
communityconfidence 60%
500.0tok/s — Llama 3 8B on RTX 3090 Ti via vllm
“10 mins I can go through 24.9M prompt tokens and 1.69M generated tokens on 2x rtx 3090 Ti and Llama 3 8B W8A8 in vLLM 0.8.3 V1, so about 41,500 t/s input and 2800 t/s output, both at the same time. I went through like 8B+ tokens on single card this way before I put in the second …”
source: Reddit · u/FullOf_Bad_Ideas · 2025-04-16
communityconfidence 60%
500.0tok/s — Llama 3 8B on RTX 3090 Ti via vllm
“10 mins I can go through 24.9M prompt tokens and 1.69M generated tokens on 2x rtx 3090 Ti and Llama 3 8B W8A8 in vLLM 0.8.3 V1, so about 41,500 t/s input and 2800 t/s output, both at the same time. I went through like 8B+ tokens on single card this way before I put in the second …”
source: Reddit · u/FullOf_Bad_Ideas · 2025-04-16

See all 19 claims for RTX 3090 Ti →

Common questions about RTX 3090 Ti

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the RTX 3090 Ti FAQ →