Hardware Today: Computing for a Cure
Silver anniversaries are normally joyous occasions, but not this one. June 5, 2005 marked the 25th anniversary of the discovery of AIDS. Amid the calls for greater awareness and increased funding was recognition that, from a medical viewpoint, billions of dollars spent on research had produced little progress in recent years.Researchers at the State University of New York's campus at Stony Brook are using a new SGI supercomputer to model the behavior of HIV-forming molecules.
This may soon change. Researchers at the State University of New York's (SUNY's) campus at Stony Brook are using a new SGI supercomputer to model the behavior of the molecules that lead to the creation of the human immunodeficiency virus (HIV). Although the structure of these molecules has been known for more than 10 years, experiments could not determine how the drugs get in and out of them. These simulations fill in the missing piece and will open the door to creating drugs that block the virus' activities.
"People have tried to create simulations of this before but weren't able to," says Dr. Carlos Simmerling, associate professor at SUNY Stony Brook's Center for Structural Biology. "The convergence of a lot of technologies that have been developed brought us to the point where we thought it would be a good time to try to solve this problem."
To create the simulation, SUNY Stony Brook uses an SGI Altix system with 1024 Intel Itanium 2 processors. It runs on Linux and has 3 TB of memory.
Unlocking the Door
The key to blocking the development of HIV lies in something called the HIV protease. A protease is an enzyme that breaks down a protein into smaller molecules. In this case, it breaks down protein molecules to create the virus that leads to AIDS. Scientists have known about this sequence of events for years and created drugs called protease inhibitors that had mixed success blocking the actions of the HIV protease.
To create more effective protease inhibitors, it is necessary to know the molecular structure of the protease. According to Simmerling, the problem with developing such drugs is that most of the time the protease molecules are in one of several "closed" states, where they are not susceptible to attack from the drugs. The research team, therefore, had to determine the structure of the molecules when they were in an "open" state.
"You have to watch in detail how the molecules move and every now and then you see it opening up, then it would close down again," he says. "That process of opening and closing is what is thought to be a lot of the drug resistance in AIDS patients."
He compares it to watching someone's daily activities. You may see that the person spends about half of her day inside the house and the other half outside. What you need to do is zero in on that point where she transitions from an outside state to an inside state the few seconds out of the 24 hours where she unlocks the door and enters the house. In the case of the HIV protease, the open state is measured in nanoseconds and is not likely to be found by physical observation. It can, however, be done through simulation.
"These simulations show the behavior of the molecule and where all the atoms are moving as a function of time," says Simmerling.
Getting the Cobalt Treatment
The SUNY Stony Brook laboratory has its own 100 unit Linux cluster to run protein simulations. However, it would take too long to run the simulations necessary to observe just 50 nanoseconds of the protease's activities.
"We go through the initial stages of building the molecules and getting everything set up on our own cluster," say Simmerling. "Then, when we are ready for the real production, we move them to the computer centers where we can do the large simulations for a long time."
With the HIV simulation, the first step involved creating the molecule and running enough of the simulation to see that the protein didn't fall apart. Then, it was moved to the Mercury cluster at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champagne. Since NCSA has a faster interconnect than the cluster at Stony Brook Myrinet rather than gigabit Ethernet as well as faster processors, the simulations could be done in half the time. To get them to run even faster, Simmerling looked into running the simulations on the NCSA's new Cobalt supercomputer, which is geared to performing these types of simulations.
"It is very convenient to use something that is just a single system image," Simmerling says. "Cobalt is really just one computer, so you don't have to deal with issues of queuing systems or multiple modes, and handling things like MPI [Message Passing Interface] is easier."
The NCSA purchased the Cobalt system in July 2004 and got it online the following year. The SGI Altix system contains 1024 Intel Itanium 2 processors running the Linux operating system, 3 TB of globally accessible memory, and an SGI InfiniteStorage TP9500 disk array to hold a 370 TB shared file system. It has peak performance in excess of 6 TB.
"The Altix has the same CPUs as the other clusters, but it has low-latency, high-bandwidth interconnects," says Simmerling. "Then, having scientific staff at SGI who really understand the machine well enough that they could help us optimize our code made a huge difference."
The NCSA recommends Cobalt be used for applications that have a moderate to high level of parallelism (32 to 512 processors) and particular applications that require more than 250 GB of shared memory. It is intended for large scale simulations, high-end visualization, large-scale interactive data analysis, and codes that perform better in an SMP environment. For applications that run on a smaller number of processors or run well in a distributed cluster environment, the NCSA recommends one of its other clusters.
The project took a total of 20,000 CPU hours to run the necessary simulations. Cobalt accomplished this within a month. The simulation allowed the SUNY Stony Brook team to not only see the protease move into the open state, but also to observe the drugs latching on to the protease. Further simulations are planned for HIV, but Simmerling is also working on other projects, such as drug-resistant tuberculosis. He says that the newer computers, like Cobalt, open up whole new areas of research that simply weren't possible before. He recommends other researchers start thinking bigger in their project plans to take advantage of the processing power now available.
"There are not enough people taking advantage of the national centers," he says. "There is computer time available and it is not that difficult to get if you have a good project."