More on supercomputing
My laptop didn’t make the latest edition of the Top500.org’s list of the most powerful supercomputers in the world. But 25 years ago, it might have. Such is the speed of advancement in the computing field that last year’s top dog is this year’s also ran.
Faster and more integrated interconnects play a key role in this ‘Asian invasion’ of the semi-annual TOP500 Supercomputing list.
“The first petaflop system (quadrillions of calculations per second) was Roadrunner at Lawrence Livermore National Laboratories based on IBM BladeCenter,” said Dan Olds, principal at Gabriel Consulting Group. “From No. 1 18 months ago, it is now in 7th place.”
In the past six months, the entry level to make it onto the list jumped to 31.1 teraflop/s (trillions of calculations per second) from 24.7 Tflop/s. No. 305 from June has fallen all the way to 500. So it will be interesting to see how long the brand new No. 1 — the Chinese Tianhe-1A system — can hold onto its lead.
Tianhe-1A is stationed at the National Supercomputer Center in Tianjin, where it achieved a performance level of 2.57 petaflop/s, running the running Linpack benchmark application used to determine the Top500. With 29376 GB of memory, it is based on Nvidia graphics processing units (GPUs) as well as Intel Xeon 5600 series processors.
A big trend emerging in supercomputing is the development of faster and more integrated interconnects. The interconnect in this Chinese system, for example, can handle data at about twice the speed of InfiniBand. This is vital, as there are thousands of chips in the system. A faster interconnect means data can move back and forth more rapidly than in other systems.
Next on the list is the Cray XT5 “Jaguar” system, which is deployed at the U.S. Department of Energy’s (DOE) Oak Ridge Leadership Computing Facility in Tennessee. Jaguar achieved 1.75 petaflop/s. Third place is held by yet another Chinese system, called Nebulae, that sits at the National Supercomputing Centre in Shenzhen. It performed at 1.27 petaflop/s.
The Asian invasion continues with a new No. four on the list with a score of 1.19 petaflop/s. Tokyo Institute of Technology (Tokyo Tech) teamed with HP to build the company’s first petascale cluster, the Tsubame 2.0. It runs applications for climate and weather forecasting, tsunami simulations and computational fluid dynamics. NEC, Microsoft, Nvidia, Intel, Mellanox, Voltaire and DataDirect Networks also collaborated on this project.
Taking up 200 square meters of space and 1.8 megawatts of power, Tsubame 2.0 includes 1,357 HP ProLiant SL390s servers, each with three Nvidia Tesla M2050 GPUs and the HP Modular Cooling System.
“We built Infiniband directly onto the motherboard to lower cost and make it more power efficient,” said Marc Hamilton, vice president for high-performance computing at HP.
This trend toward power efficiency is becoming a bigger deal than in previous years. In fact, Top500.org now highlights it in its semi-annual announcements. IBM’s prototype of the BlueGene/Q system won the honors by setting a record in power efficiency with a value of 1,680 Mflops/watt, more than twice that of the next best system.
Dave Purek, vice president of deep computing at IBM, said this was achieved by looking for efficiencies across the entire system as well as by offloading more technical functions onto silicon. This removes wiring and cards, thereby cutting down on space while increasing efficiency, he said. This is backed up by a networking architecture harnessing a variety of networking technologies, each tuned to specific functions within the system as a whole.
Purek noted that IBM has orders for 10 Pflop and 20 Pflop system, which will be delivered during the next year or so. However, he questioned the mechanism behind the Top500 listing, the Linpack benchmark.
“Linpack doesn’t really tell you much, and I’ve never had a client buy a system based on a Linpack score,” said Purek.
Olds, too, considers that Linpack is becoming less valid in terms of how customers utilize the machines in the real world.
“As software and the tasks they are used for become more complex, different parts of the system are stressed,” said Olds. “Linpack basically tests how well boxes do linear algebra, not how they do other things like stream processing.”
Purek ties this in to the rise of the GPU. The two highly rated Chinese systems and the Japanese Tsubame 2.0 all use Nvidia GPUs to accelerate computation. In all, 17 systems on the Top500 use GPUs, with six using the Cell processor, 10 of them using Nvidia chips and one using ATI Radeon chips.
“Big Linpack numbers are not indicative of actual work that can be done,” said Purek. “GPUs can create a bottleneck between memory and processing so it requires a lot of work to get these systems in balance and productive.”
Further, GPU systems tend to require custom programming and thus thrive in markets where people write their own codes.
“The jury still out on the utility of GPUs, and this is shown by the absence of commercial software code that exploits them,” said Purek.