Improving mod_perl Driven Site's Performance -- Part I: Choosing Operating System and Hardware Page 5
Redundant Array of Inexpensive Disks (RAID): An array of physical disks, usually treated by the operating system as one single disk, and often forced to appear that way by the hardware. The reason for using RAID is often simply to achieve a high data transfer rate, but it may also be to get adequate disk capacity or high reliability. Redundancy means that the system is capable of continued operation even if a disk fails. There are various types of RAID array and several different approaches to implementing them. Some systems provide protection against failure of more than one drive and some ('hot-swappable') systems allow a drive to be replaced without even stopping the OS.
If you are building a fan site and you want to amaze your friends with
a mod_perl guest book, any old 486 machine could do it. If you are in
a serious business, it is very important to build a scalable server.
If your service is successful and becomes popular, the traffic could
double every few days, and you should be ready to add more resources
to keep up with the demand. While we can define the webserver
scalability more precisely, the important thing is to make sure that
you can add more power to your
webserver(s) without investing much
additional money in software development (you will need a little
software effort to connect your servers, if you add more of them).
This means that you should choose hardware and OSs that can talk to
other machines and become a part of a cluster.
On the other hand if you prepare for a lot of traffic and buy a monster to do the work for you, what happens if your service doesn't prove to be as successful as you thought it would be? Then you've spent too much money, and meanwhile faster processors and other hardware components have been released, so you lose.
Wisdom and prophecy, that's all it takes :)
Let's start with a claim that a four-year-old processor is still very powerful and can be put to a good use. Now let's say that for a given amount of money you can probably buy either one new very strong machine or about ten older but very cheap machines. I claim that with ten old machines connected into a cluster and by deploying load balancing you will be able to serve about five times more requests than with one single new machine.
Why is that? Because generally the performance improvement on a new machine is marginal while the price is much higher. Ten machines will do faster disk I/O than one single machine, even if the new disk is quite a bit faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the new machine you have just bought might not stand the load. Then you will have to purchase more equipment and think about how to implement load balancing and web server file system distribution anyway.
Why I am so convinced? Look at the busiest services on the Internet: search engines, web-email servers and the like -- most of them use a clustering approach. You may not always notice it, because they hide the real implementation details behind proxy servers.