Redundant Array of Inexpensive Disks (RAID): An array of physical
disks, usually treated by the operating system as one single disk, and
often forced to appear that way by the hardware. The reason for using
RAID is often simply to achieve a high data transfer rate, but it may
also be to get adequate disk capacity or high reliability. Redundancy
means that the system is capable of continued operation even if a disk
fails. There are various types of RAID array and several different
approaches to implementing them. Some systems provide protection
against failure of more than one drive and some (‘hot-swappable’)
systems allow a drive to be replaced without even stopping the OS.
Machine Strength Demands According to Expected Site Traffic
If you are building a fan site and you want to amaze your friends with
a mod_perl guest book, any old 486 machine could do it. If you are in
a serious business, it is very important to build a scalable server.
If your service is successful and becomes popular, the traffic could
double every few days, and you should be ready to add more resources
to keep up with the demand. While we can define the webserver
scalability more precisely, the important thing is to make sure that
you can add more power to your webserver(s)
without investing much
additional money in software development (you will need a little
software effort to connect your servers, if you add more of them).
This means that you should choose hardware and OSs that can talk to
other machines and become a part of a cluster.
On the other hand if you prepare for a lot of traffic and buy a
monster to do the work for you, what happens if your service doesn’t
prove to be as successful as you thought it would be? Then you’ve
spent too much money, and meanwhile faster processors and other
hardware components have been released, so you lose.
Wisdom and prophecy, that’s all it takes 🙂
Single Strong Machine vs. Many Weaker Machines
Let’s start with a claim that a four-year-old processor is still very
powerful and can be put to a good use. Now let’s say that for a given
amount of money you can probably buy either one new very strong
machine or about ten older but very cheap machines. I claim that with
ten old machines connected into a cluster and by deploying load
balancing you will be able to serve about five times more requests
than with one single new machine.
Why is that? Because generally the performance improvement on a new
machine is marginal while the price is much higher. Ten machines will
do faster disk I/O than one single machine, even if the new disk is
quite a bit faster. Yes, you have more administration overhead, but
there is a chance you will have it anyway, for in a short time the new
machine you have just bought might not stand the load. Then you will
have to purchase more equipment and think about how to implement load
balancing and web server file system distribution anyway.
Why I am so convinced? Look at the busiest services on the Internet:
search engines, web-email servers and the like — most of them use a
clustering approach. You may not always notice it, because they hide
the real implementation details behind proxy servers.