M3 Max — LLM benchmarks
No benchmarks on M3 Max yet.
No M3 Max benchmarks yet.
Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench
$ pipx install llm-speed && llm-speed bench
Community folklore on M3 Max
45 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.
- communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
- communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
- communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
- communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
- communityconfidence 75%
40.00tok/s — gpt-oss-120B on M3 Max via llama.cpp Q2
“5-air Both are MoE models which improve speed and just about max out the amount of ram you'd have. On my 128GB M3 Max macbook, these are \~40 t/s which is fine (note that glm is slower and both are faster running with mlx instead of llama.cpp). I think the Ryzen is broadly simi…”
- communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
- communityconfidence 65%
8.50tok/s — Llama 2 70B on M3 Max Q4_0
“with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …”
- communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
- communityconfidence 65%
8.50tok/s — Llama 2 70B on M3 Max Q4_0
“with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …”
- communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
Common questions about M3 Max
Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.