Hardware Today: Clusters Catch on in the Enterprise
Tang isn't the only product to come out of the space program. Server clustering started there as well. And, unlike the beverage, it delivers benefits directly related to ROI. Clustering provides a scalable approach for enterprises to achieve supercomputing power by simply running commodity servers in parallel over an Ethernet connection.Out of the NASA lab and into an enterprise near you, clustering technology is practically plug-and-play.
Thomas Sterling and Don Becker built the first such cluster, known as Beowulf, in 1994 at NASA's Goddard Space Flight Center in Greenbelt, Maryland. That unit consisted of 16 Intel DX4 processors connected by 10 Mbps Ethernet. By the end of that decade, more than 100 clusters were in use at research universities, laboratories, and corporations. The most recent list of the 500 fastest supercomputers includes 360 clusters, up from two just seven years earlier. And the growth isn't limited to top-end systems.
Clusters do not scale linearly. Instead, they experience a fair amount of overhead, typically in the 15 to 25 percent range.
"The big news is that server clustering is going mainstream moving from purely the domain of the scientists and academics in federal labs and universities to commercial aerospace, auto design, oil and gas, financial services and into the general enterprise," says Pauline Nist, senior vice president for product development and management at Penguin Computing. "CIOs are increasingly moving to managing large clusters and grids of servers for applications, such as Web services and business logic."
The initial driver for developing clusters was to reduce the cost of building high performance computers. And the technology can still do that many clusters are built with off-the-shelf Intel or AMD boxes. Virginia Tech even has a cluster consisting of 1,100 dual-processor Apple Xserve units that produces 12 teraflops, enough to rank it the 20th fastest supercomputer in the world. But for many organizations, cost cutting is no longer the only motivator. Instead, clustering offers them a way to improve scalability and reliability.
"Having the ability to add a node into the configuration and redistribute the workload can be very appealing," says Chip Nickolett, president of Comprehensive Consulting Solutions in Brookfield, Wisc. "While not fully fault tolerant, they do provide companies with the ability to resume business operations and/or processing with a minimal amount of disruption, downtime, and loss of data."
Setting up a cluster involves more than just wiring together a bunch of servers. "Clusters are complex to set up and complex to manage," says John Humphreys, research manager with IDC.
Nickolett cites several misconceptions customers frequently have about clustering. As a result, they sometimes fail to realize all that is involved in setting up and operating a cluster. The first is that clusters do not scale linearly. Instead, they experience a fair amount of overhead, typically in the 15 to 25 percent range.
"They [customers] expect near-linear results when combining systems and may soon realize that this is not a realistic expectation," he says. "This can result in having to acquire more hardware than originally planned or enhancing the software to perform more efficiently.
"When you have a computer cluster with 10,000 files created, you might have 1,000 storage servers attached. A modular approach makes a better approach than big iron." David Freund, Illuminata analyst
And then there is the matter of software. Managing a cluster involves far more than just sharing disk space or distributing a processing work load.
"Software needs to be cluster aware," Nickolett continues. "That means that a distributed lock manager (which causes most of the overhead) needs to be built into the software to manage concurrency and data consistency when being accessed by more than one physical system."
He believes Ingres Corporation, based in Redwood City, Calif., is one of the leaders in this area due to its database software. In addition, Oracle has the Real Application Cluster version of its enterprise databases, and IBM has the DB2 Integrated Cluster Environment for Linux, which consists of its DB2 Universal Database running on an eServer 1350 cluster.
But no matter what changes are made to the databases or applications, there is still the matter of managing the cluster itself, something Penguin Computing's Nist acknowledges is "still very complex, time consuming, and error prone." To simplify matters, one growing area is "stateless provisioning," where the operating system, middleware, and application stacks are loaded into memory rather than on the hard disks.
"There is a growing recognition among organizations that stateless provisioning is a promising approach to dramatically improving the management of large pools of servers," says Nist. "It loads orders of magnitude faster, guards against having the wrong versions, and is much more effortless. And repurposing can occur on-demand as service and business demands change and resource allocation must be adjusted."