Skip to content
llm-speed

M3 Max — LLM benchmarks

No benchmarks on M3 Max yet.

No M3 Max benchmarks yet.

Run on YOUR hardware to populate this page: pipx install llm-speed && llm-speed bench

$ pipx install llm-speed && llm-speed bench

Community folklore on M3 Max

45 unverified claims extracted from Reddit/HN comments. Lower trust than signed runs above — every row links to the source.

  • communityconfidence 75%

    79.60tok/s Qwen3-8B on M3 Max via mlx bf16

    Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …

    source: Reddit · u/Remarkable_Jicama775 · 2026-04-11

  • communityconfidence 75%

    79.60tok/s Qwen3-8B on M3 Max via mlx bf16

    Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …

    source: Reddit · u/Remarkable_Jicama775 · 2026-04-11

  • communityconfidence 75%

    79.60tok/s Qwen3-8B on M3 Max via mlx bf16

    Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …

    source: Reddit · u/Remarkable_Jicama775 · 2026-04-11

  • communityconfidence 75%

    79.60tok/s Qwen3-8B on M3 Max via mlx bf16

    Happy to coordinate if you open-source yours first — no point duplicating. EDIT: Benchmarks complete on M3 Max (128GB). I am getting 79.6 tok/s on Qwen3-8B-bf16 (3.41x speedup) with confirmed bit-for-bit parity against the baseline. For those interested in the MLX-specific …

    source: Reddit · u/Remarkable_Jicama775 · 2026-04-11

  • communityconfidence 75%

    40.00tok/s gpt-oss-120B on M3 Max via llama.cpp Q2

    5-air Both are MoE models which improve speed and just about max out the amount of ram you'd have. On my 128GB M3 Max macbook, these are \~40 t/s which is fine (note that glm is slower and both are faster running with mlx instead of llama.cpp). I think the Ryzen is broadly simi…

    source: Reddit · u/RemarkableAd66 · 2025-11-20

  • communityconfidence 65%

    10.00tok/s Llama 2 70B on M3 Max Q4_0

    OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …

    source: HN · u/lhl · 2024-09-21

  • communityconfidence 65%

    8.50tok/s Llama 2 70B on M3 Max Q4_0

    with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …

    source: HN · u/lhl · 2024-09-21

  • communityconfidence 65%

    10.00tok/s Llama 2 70B on M3 Max Q4_0

    OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …

    source: HN · u/lhl · 2024-09-21

  • communityconfidence 65%

    8.50tok/s Llama 2 70B on M3 Max Q4_0

    with their MBP 14 w/ an M3 Max [1] (128GB, 40CU, theoretical: 28.4 FP16 TFLOPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 …

    source: HN · u/lhl · 2024-09-21

  • communityconfidence 65%

    10.00tok/s Llama 2 70B on M3 Max Q4_0

    OPS, 400GB/s MBW)<p>The results for Llama 2 70B Q4_0 (39GB) was 8.5 tok/s for text generation (you'd expect a theoretical max of a bit over 10 tok/s based on theoretical MBW) and a prompt processing of 19 tok/s. On a 4K context conversation, that means you would be waiting about …

    source: HN · u/lhl · 2024-09-21

See all 45 claims for M3 Max

Common questions about M3 Max

Direct Q&A drawn from the runs above: fastest LLM, supported model classes, backend rankings, quantization guidance.

Read the M3 Max FAQ →