Raising (and Lowering) the Bar for High Performance Computing
High performance computing (HPC) systems regularly hit the news, especially when the semiannual release of the Top 500 Supercomputer list is on the horizon. Topping the first list in 1993 was a Thinking Machines CM-5 with 1024 processors and 60 gigaflops of performance. The current leader at 280 teraflops is IBM's Blue Gene/L. That equates to 45,000 times the performance. Several vendors are looking to cross the petaflop threshold within a year.
|Never has high performance computing been attainable for so many enterprises. At the same time, organizations that need even more computing power have no shortage of options.|
A variety of factors are behind this trend. For one thing, hardware price drops are making the systems affordable to people outside the Fortune 500 and national research laboratories. Another is the switch in emphasis from hardware to software, with Linux clusters replacing monolithic supercomputer designs.
"We went from figuring out how to design the hardware so the systems work together to taking off-the-shelf parts and figuring out how to design the software so the machines work together," said Donald Becker, co-builder of the world's first Linux cluster at NASA's Goddard Space Flight Center in the mid-90s and currently CTO of Penguin Computing in San Francisco.
Linux clusters, in fact, have revolutionized both the top and bottom ends of the market. The June 2007 Top 500 list featured 373 clusters, compared to one a decade earlier. These days, individuals can create a Beowulf cluster at home out of used PCs. But this trend may be reaching an end. Willard said clusters have begun to approach their limits at the high end, and future improvements will come from using special-purpose processors designed to offload certain routine algorithms from the main CPU. Some vendors are already starting to do this.
Let's take a look at recent developments within the HPC space to see what vendors are doing to boost performance on their systems.
Cray of Seattle, Wash. has been making a comeback since it split from SGI in 2000. It landed 11 systems on the Top 500 list, including the No. 2 and No. 3 spots. Oak Ridge National Laboratory's (ORNL) Jaguar system, a mix of Cray XT3 and XT4 systems has 11,706 AMD Opteron compute nodes, 46 TB of memory and a peak performance of more than 119 teraflops. The lab plans to double its speed later this year by upgrading to quad-core chips, doubling the memory and migrating to a stripped down version of Linux on the compute nodes.
"The Cray XT4 is an exceptionally well-balanced system that provides the best overall performance on our mix of applications," said ORNL project director Arthur Bland. "It is highly scalable and easily upgradable."
Cray's XMT platform is a massively multithreaded processing system designed to deliver more than 1 million concurrent processing threads. The Cray XT4 supercomputer is a massively parallel processing system that uses AMD's HyperTransport technology and Opteron processors, along with Cray's three-dimensional torus interconnect network. It scales to more than 1 petaflop. The Cray X1E supercomputer uses proprietary vector processors.
According to company President Pete Ungaro, Cray is planning to make another major HPC product announcement before the end of the year.
IBM continues to dominate the very top of the HPC space, producing six of the top 10 systems on the Top 500 list, and 46 of the top 100. The largest is the Blue Gene/L computer at Lawrence Livermore National Laboratory, which delivers a sustained performance of 280 teraflops. In June, the company released the second generation of its Blue Gene/Pplatform, which scales up to 3 petaflops.
"Blue Gene/P nearly triples the performance of its predecessor Blue Gene/L, currently the world's fastest computer, while remaining the most energy-efficient and space-saving computing package ever built," said David Turek, IBM vice president for Deep Computing.
Like its predecessor, the new supercomputer uses IBM POWER processors. The difference is that it is a 4-core rather than a 2-core model. Germany's Max Planck Institute installed the world's first Blue Gene/P in September. But with only 8,192 processors, the new system will not be setting any records. Argonne National Laboratory is installing one four times that large later this year. However, at 111 teraflops, it will still rank below Lawrence Livermore's Blue Gene/L.
Penguin Computing makes HPC clusters, servers and storage for the high and low ends of HPC. It also produces Scyld cluster management software. In January, Penguin released a new server designed for low-cost HPC clusters. The Altus 600 comes with up to two AMD Opteron processors and 64GB of RAM, with dual-processor configurations. It is priced from $1,499. The servers eliminate features not needed for HPC, making them more efficient than general-purpose servers.
SGI's high-performance computers incorporate its RASC (Reconfigurable Application Specific Computing) technology that uses Field Programmable Gate Arrays (FPGA), enabling owners to customize the system to optimize computation of particular algorithms. This greatly reduces processing time. In June, SGI released its Altix ICE platform.
"Companies today are dealing with increasingly difficult compute problems, with ever-growing amounts of data to manage and process, and pressure to deliver results in record time," said Joe Mansour, SGI's Director of Strategic Accounts and Program Capture. "One way to attack this critical problem is to build large clusters, but this approach often results in unforeseen and unmanageable expenses for administration, space and power."
SGI Altix ICE, on the other hand, is specifically designed for HPC applications with cable-free blade enclosures, integrated switches and a high-performance interconnect structure. This architecture allows faster operations and greater blade density, while cutting power and heat. ICE also includes SGI's new water-cooled door design.
"With the SGI Altix ICE platform, SGI has made green HPC a reality, helping customers conquer the challenges of soaring electric and cooling expenses, and maximizing the reliable life of the new platform by ensuring it runs cool," said Mansour.
Sun Microsystems offers rackmount and blade HPC clusters with x86 or SPARC processors running Linux or Solaris. In June, the company announced its Constellation System, a petascale system developed in collaboration with the University of Texas in Austin that uses Solaris 10. The Texas Advanced Computing Center is building a 500-plus terabyte Constellation system, with 123TB of RAM and 1.7PB of raw disk storage. It is scheduled to go into production in December.
Watch out, though, for more interesting developments on the cluster front from Sun. In September, the company acquired the majority of Cluster File Systems' intellectual property, including the Lustre File System. Lustre is a file system designed for large scale computing clusters, including the Constellation system being built in Texas.