Over just the past couple of years, supercomputing has accelerated into the exascale era—with the world’s most massive machines capable of performing over a billion billion operations per second. But unless big efficiency improvements can intervene along its exponential growth curve, computing is also anticipated to require increasingly impractical and unsustainable amounts of energy—even, according to one widely cited study, by 2040 demanding more energy than the world’s total present-day output.
Fortunately, the high-performance computing community is shifting focus now toward not just increased performance (measured in raw petaflops or exaflops) but also higher efficiency, boosting the number of operations per watt.
The Green500 list saw newcomers enter into the top three spots, suggesting that some of the world’s newest high-performance systems may be chasing efficiency at least as much as sheer power.
The newest ranking of the Top500 supercomputers (a list of the world’s most powerful machines) and its cousin the Green500 (ranking instead the world’s highest-efficiency machines) came out last week. The leading 10 of the Top 500 largest supercomputers remains mostly unchanged, headed up by Oak Ridge National Laboratory’s Frontier exascale computer. There was only one new addition in the top 10, at No. 6: Swiss National Supercomputing Center’s Alps system. Meanwhile, Argonne National Laboratory’s Aurora doubled its size, but kept its second-tier ranking.
On the other hand, The Green500 list saw newcomers enter into the top three spots, suggesting that some of the world’s newest high-performance systems may be chasing efficiency at least as much as sheer power.
Heading up the new Green500 list was JEDI, Jülich Supercomputing Center’s prototype system for its impending JUPITER exascale computer. The No. 2 and No. 3 spots went to the University of Bristol’s Isambard AI, also the first phase of a larger planned system, and the Helios supercomputer from the Polish organization Cyfronet. In fourth place is the previous list’s leader, the Simons Foundation’s Henri.
A Hopper Runs Through It
The top three systems on the Green500 list have one thing in common—they are all built with Nvidia’s Grace Hopper superchips, a combination of the Hopper (H100) GPU and the Grace CPU. There are two main reasons why the Grace Hopper architecture is so efficient, says Dion Harris, director of accelerated data center go-to-market strategy at Nvidia. The first is the Grace CPU, which benefits from the ARM instruction set architecture’s superior power performance. Plus, he says, it incorporates a memory structure, called LPDDR5X, that’s commonly found in cellphones and is optimized for energy efficiency.
The second advantage of the Grace Hopper, Harris says, is a newly developed interconnect between the Hopper GPU and the Grace CPU. The connection takes advantage of the CPU and GPU’s proximity to each other on one board, and achieves a bandwidth of 900 gigabits per second, about 7 times as fast as the latest PCIe gen5 interconnects. This allows the GPU to access the CPU’s memory quickly, which is particularly important for highly parallel applications such as AI training or graph neural networks, Harris says.
All three top systems use Grace Hoppers, but Jülich’s JEDI still leads the pack by a noticeable margin—72.7 gigaflops per watt, as opposed to 68.8 gigaflops per watt for the runner-up (and 65.4 gigaflops per watt for the previous champion). The JEDI team attributes their added success to the way they’ve connected their chips together. Their interconnect fabric was also from Nvidia—Quantum-2 InfiniBand—rather than the HPE Slingshot used by the other two top systems.
The JEDI team also cites specific optimizations they did to accommodate the Green500 benchmark. In addition to using all the latest Nvidia gear, JEDI cuts energy costs with its cooling system. Instead of using air or chilled water, JEDI circulates hot water throughout its compute nodes to take care of the excess heat. “Under normal weather conditions, the excess heat can be taken care of by free cooling units without the need of additional cold-water cooling,” says Benedikt von St. Vieth, head of the division for high-performance computing at Jülich.
JUPITER will use the same architecture as its prototype, JEDI, and von St. Vieth says he aims for it to maintain much of the prototype’s energy efficiency—although with increased scale, he adds, more energy may be lost to interconnecting fabric.
Of course, most crucial is the performance of these systems on real scientific tasks, not just on the Green500 benchmark. “It was really exciting to see these systems come online,” Nvidia’s Harris says, “But more importantly, I think we’re really excited to see the science come out of these systems, because I think [the energy efficiency] will have more impact on the applications even than on the benchmark.”
From Your Site Articles
Related Articles Around the Web