Introduction:
One of the greatest worries today is high availability of your IT systems.
Downtime can be caused by hardware failure, Software errors and much more, but
industry studies show that 80 percent of system failure can be traced to human
errors of flawed processes. I think that everyone knows someone who has lost
vital information because they forgot to do a backup. This is classic example
of the kind of problem a rigorous IT operations environment can help avoid.
The W2K advanced server and W2K Datacenter server are not the first products
that Microsoft brought to market for high availability, NT 4.0 Server enterprise
edition supported a 2 node fail-over scenario.
Microsoft claims that with their new products there should be a up-time of 99,999
percent, which means that the downtime is 5 minutes over a year. In this article
I review that not only good products are needed for high availability, but also
that your organization must be ready for it.
What about high availability:
One of the greatest worries today is high availability of your IT systems. Downtime can be caused by hardware failure, Software errors and much more, but industry studies show that 80 percent of system failure can be traced to human errors of flawed processes. I think that everyone knows someone who has lost vital information because they forgot to do a backup. This is classic example of the kind of problem a rigorous IT operations environment can help avoid. The W2K advanced server and W2K Datacenter server are not the first products that Microsoft brought to market for high availability, NT 4.0 Server enterprise edition supported a 2 node fail-over scenario.
Clearly, a good operating system and the proper hardware are a good start for
getting a better uptime, but what about the following points:
- Procedures
- Employees
- Management
- Infrastructure
- Buildings
Procedures:
One of the best things to do is building your services of the IT department
on Standardized best practices. At the moment one of the leading best practices
for the IT are well documented within the Central Computer en Telecommunications
Agency’s (CCTA) IT infrastructure Library (ITIL). This is not simply a book
or tool but this is a set of practices, which describe procedures. It is also
a mind setting, so implementing ITIL is not done on a rainy afternoon, but it
takes time before the people have reached the proper mind setting. Don’t under-estimate
an ITIL implementation.
Employees:
Certificating pays back. Before you give somebody a car, you also make sure
that he or she knows how to drive it. The same goes for IT, you can give your
employees the best solutions in software or hardware, but if they don’t know
how to work with it, even the best solution is worth nothing. Make sure that
your employees have the proper training to work with the systems, i.e. MCSE-,
CCNA-, CNE-training.
Management:
Not only the administrators and engineer need to be trained, also the management.
They need to have training so that they know the problems of their administrator
and engineers, not in detail, but in basic. They also need to know how they
follow the procedures to set the proper lines for the customers of the IT-department
and for the employees of the department.
Infrastructure:
You can have the best operating system and hardware, but if your infrastructure
have a single point of failure, they’re worthless. So make sure that you inveterate
your entire infrastructure when you are thinking about high availability. Use
redundant router/switch solutions. Create backup routes in your WAN or to the
Internet. Not simple, but worth doing it.
Buildings:
One of the things people often forgot when talking and thinking of high availability.
How often do we see that everybody can enter the server room without any authentication
or that there is no airco available in this room. A thing, which you also can
consider is the geographical location of your building (earthquakes, floods,
fire) so make sure that you have a good total backup solution. This is also
part of the ITIL set.
In this article, which is not a very technical one, I pointed out some of the
other things concerning high availability.