Skip to content
llm-speed
Leaderboard/models/deepseek-coder-v2-lite-instruct

Coder-V2-Lite-Instruct-4bit

1 workload result across 1 hardware configuration.

Fastest local config

168.3 decode tok/s

on M3 Ultra (60-core GPU) + 96GB unified via mlx see full run

Local runs (1 run)

Runs from contributors' own machines via MLX, llama.cpp, vLLM, exllamav2, or ollama. Signed on the submitter's hardware.

M3 Ultra (60-core GPU) + 96GB unifiedM3 Ultra (60-core GPU) + 96GB unified

WorkloadBackendQuantdecode tok/sprefill tok/sTTFTRun
chat-shortmlx@0.31.3168.3tok/s291.5tok/s449msr_l_v1-zq_qaz

Coder-V2-Lite-Instruct-4bit on hardware