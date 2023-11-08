Every six months, NVIDIA, Intel, and Google show off how much their AI training hardware and software has improved, while their competitors remain hiding in the bushes. NVIDIA is beginning to see some competitive threats, but remains the leader, particularly in the scale of its supercomputers and benchmark breadth.

The biannual benchmarking marathon for AI training has just published the latest results in MLPerf 3.1. It’s not surprising that most competitors shy away from transparency. Nevertheless, Intel once again demonstrated that it is the only viable alternative to NVIDIA for AI, at least until AMD releases its MI300 GPU next month. Here are some observations.

NVIDIA wins massively

NVIDIA is justifiably proud of its new EOS supercomputer, which it claims is the fastest AI facility in the universe; I mean the world. The system is now in production, being used to help NVIDIA research new AI methods, run benchmarks, and perhaps most importantly, develop the next generation of GPUs and Arm CPUs for AI. This is a great resource for implementing. However, the platform matches Microsoft Azure’s own H100 GPU estate, with a similar number of H100 GPUs available for rent to well-off customers.

As a baseline, it’s clear that the industry is making great progress in improving AI performance over time. The chart below compares the best performance over time on various benchmarks developed by the MLCommons community compared to Moore’s Law. The green line on the right is GPT3 training time, which has improved by 3x over the past five months, although most of this came from the increase in the number of GPUs used to run the benchmark. This larger scale is typical of hyperscaler use cases but is not particularly relevant to enterprises.

If we look specifically at LLM, NVIDIA H100 has surprisingly 4x improvement over the same number of GPUs and 73x improvement if you scale up to 10,000 GPUs.

For the first time, MLCommons adds a benchmark for stable propagation to image generation, the AI ​​models that make applications like MidJourney and Dell-3 possible. These apps are becoming mainstream for content creators, and the H100 can double the performance of the Intel Gaudi2 in training these models.

As the HPC world gathers in Denver next week for Supercomputing ’23, the MLperf HPC benchmarks are on time and dramatically faster. The three HPC benchmarks have improved by 10-16x since the first benchmark. AI has become an important tool for scientists using supercomputers, causing NVIDIA and AMD GPUs (who once again ignored the opportunity to benchmark their chips) to become a requirement for most supercomputers around the world. went. This is strange because AMD has better double precision floating point than NVIDIA and has won 2 out of 3 US exascale installations in the US DOE.)

Habana Labs’ Intel Gaudi 2 doubles the performance with software.

NVIDIA failed to show a head-to-head comparison of the GPU vs. Intel Habana Gaudí, preferring instead to promote its impressive scale on EOS. However, according to Intel’s math, adding support for FP8 doubled Habana Gaudí 2’s previous performance, bringing it to about 50% of the per-node results of NVIDIA’s H100. Intel claimed this equates to better price performance, which we verified with channel testing, saying the Gaudi 2 performs quite well and is much more affordable and available than NVIDIA’s.

These results will help pave the way for Gaudi3, expected in 2024. But of course, at that time, Intel will have to compete with NVIDIA’s next-generation GPU, the B100, aka Blackwell. We don’t know anything about the H100 except that it will be produced on TSMC’s 3nm process. Rumor has it that it’s a relatively minor upgrade (more transistors and more flops enabled by more HBM) over its predecessor, the A100, but we’ll see.

We would like to point out that Gaudí 2 is manufactured on TSMC’s 7nm process, while the H100 is produced on TSMC’s 5nm production line. Additionally, the much-touted NVIDIA Transformer Engine is expected to increase the H100’s performance by 3-5X. As a result, we believe that when using industry-standard 1000Gb Ethernet rather than more expensive Infiniband networking, Gaudí 3 has a good chance of catching up with, if not surpassing, the H100. .

We should also mention that Intel has presented results for the Xeon fourth generation CPUs, which have quite a bit of die area dedicated to performing the matrix operations required to run AI. A large NVIDIA Korean customer, Naver Corporation, maker of South Korea’s top search portal, has switched from Nvidia GPUs to Intel CPUs, citing AI GPU shortages and rising prices.

conclusion

It seems to us that the combination of Goudie and the upcoming AMD MI300 will, for the first time, at least from a hardware standpoint, provide a competitive alternative to NVIDIA. (NVIDIA’s software and ecosystem will remain the same for years to come.) While not as fast as the H100, the Gaudi 2 is more affordable and available and can get the job done, especially in enterprise use cases. And hundreds of Hugging Face models are ready for deployment on Gaudi2.

It should now be clear why NVIDIA recently decided to double the speed at which it produces new CPUs and GPUs, increasing its competitive advantage. Time will tell us how well customers can adjust to NVIDIA’s new roadmap, as the annual release of new products that are better than the ones you buy now will create churn and could shorten the lifespan of each generation.