M3 Max — LLM benchmarks

Name: M3 Max — community LLM benchmarks
Creator: llm-speed
License: https://www.apache.org/licenses/LICENSE-2.0
Keywords: M3 Max, LLM benchmark, tokens per second, decode tok/s, prefill, TTFT

No benchmarks on M3 Max yet.

No M3 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

read the methodology

Community folklore on M3 Max

45 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
source: Reddit · u/Remarkable_Jicama775 · 2026-04-11
communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
source: Reddit · u/Remarkable_Jicama775 · 2026-04-11
communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
source: Reddit · u/Remarkable_Jicama775 · 2026-04-11
communityconfidence 75%
79.60tok/s — Qwen3-8B on M3 Max via mlx bf16
“Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …”
source: Reddit · u/Remarkable_Jicama775 · 2026-04-11
communityconfidence 75%
40.00tok/s — gpt-oss-120B on M3 Max via llama.cpp Q2
“5-air Both are MoE models which improve speed and just about max out the amount of ram you'd have. On my 128GB M3 Max macbook, these are \~40 t/s which is fine (note that glm is slower and both are faster running with mlx instead of llama.cpp). I think the Ryzen is broadly simi…”
source: Reddit · u/RemarkableAd66 · 2025-11-20
communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
source: HN · u/lhl · 2024-09-21
communityconfidence 65%
8.50tok/s — Llama 2 70B on M3 Max Q4_0
“with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …”
source: HN · u/lhl · 2024-09-21
communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
source: HN · u/lhl · 2024-09-21
communityconfidence 65%
8.50tok/s — Llama 2 70B on M3 Max Q4_0
“with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …”
source: HN · u/lhl · 2024-09-21
communityconfidence 65%
10.00tok/s — Llama 2 70B on M3 Max Q4_0
“OPS, 400GB/s MBW)The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …”
source: HN · u/lhl · 2024-09-21

See all 45 claims for M3 Max →

Common questions about M3 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M3 Max FAQ →