AMD Narrows The gap With Nvidia In New MLPerf Benchmarks

New benchmark results from AMD, Untether AI, Google, Intel, and Nvidia demonstrate the converging AI silicon performance competition.

8/6/20241 min read

Finally, I can stop whining about AMD’s lack of open AI benchmarks. AMD has published excellent MLPerf inference results for their MI300 GPU, which is competitive with the Nvidia H100, although only on a single benchmark. Canadian startup Untether.ai also published new inference benchmarks showing their power efficiency. Let’s take a look.

The MLPerf Inference 4.1 Benchmark Suite

The MLCommons industry consortium, which controls and publishes the MLPerf benchmarks, has extended the twice-annual inference benchmark suite with a new one for the increasingly popular mixture-of-experts (MoE) AI models. MoE models combine multiple models to improve accuracy and lower the training costs of huge LLM models, like OpenAI’s GPT-4. AMD did not publish an MoE benchmark, but now that they have broken the benchmarking ice, an AMD spokesperson indicated we could see more shortly.

Its is certainly encouraging to see submissions to MLPerf for new processors. Specifically in addition to the Nvidia Blackwell and the first AMD submissions, we now have selected benchmarks for Untether.ai, AMD’s next generation Turin CPU, Google’s Trillium TPUv6e accelerator, and Intel’s Granite Rapids Xeon CPU. We will focus here on Nvidia, AMD, and Untether.ai.

AMD is roughly on par with the Nvidia H100, while the H200 is 43% faster

While AMD has previously disclosed micro benchmarks that highlight raw theoretical performance, such as that of the math performance on the MI300, these do not reflect the complex world of AI stacks. The AMD marketing claims that the MI300 is the fastest AI GPU were not validated with this new benchmark, but it is in the ballpark of the H100 when running a real AI workload. The Nvidia H200, however, beat the MI300 by some 43% on the same benchmark.

We note that the Llama 2 70B benchmark doesn’t really allow AMD to strut its stuff with respect to having a larger HBM to support larger models. Hopefully we will see them run the new Mixtral MoE in a future MLPerf release.

Nvidia also published the first Blackwell benchmarks, demonstrating about four times the performance of the H100 on medium-sized models (Llama 2 70B). Nvidia recently shared more details on Blackwell NVL72 at HotChips, in which the NVSwitch interconnected infrastructure is supposed to deliver 30 times better inference performance than H200. Can’t wait to see actual (MLPerf) benchmarks for the flagship NVL72.