Why You Should Be Concerned With Disaster Recovery

Why You Should Be Concerned With Disaster Recovery


October 25, 2001

Introduction Plan We Must

At the heart of a computer-related disaster is the loss of corporate servers and their connections. With them go an enterprise's data, internal and external links, and the vital day-to-day operations of applications such as order entry, manufacturing, and accounting. Although total disasters are rare, 50 percent of enterprises that do not have a disaster recovery plan go out of business following a disaster.

So developing a plan is essential

The broad outline of a disaster recovery plan follows common sense: 1) assess the risk, 2) develop solutions, 3) implement the plan, and 4) maintain the plan.

Of course, in practice it's not quite that simple. For example, large enterprises with multiple locations (especially multiple IT locations) obviously have different disaster recovery requirements than small businesses.

More to the point, when developing a plan the key question to ask is "How much downtime can the company afford?" Businesses that don't rely on computers for communications and other business transactions may be only minimally affected. Other businesses may be able to withstand a loss of server operation as long as data (and to a lesser extent application software) has been adequately protected. Other businesses must have constant real-time computer operation to survive.

Cost Is a Factor

The cost of a disaster can, of course, be enormous. But assessing the risk and impact of disaster often sets up the question: How much will disaster recovery and protection cost? If the risk is relatively low or the cost of disaster not so high, then the ongoing cost of protection should be kept low. Most enterprises cannot afford unlimited protection, but they must find the right balance between protection and risk.Assessing the Risk

Disaster risk assessment is an often complex and sometimes subjective endeavor. Therefore, this tutorial can only begin to touch the surface on approaches and topics. Risk assessment is one area where the differences between enterprises are extremely important and where consultants can be invaluable.

Generally, however, a common approach is to break down a company's IT risks into two main categories:

  • Computer system problems (often including telecommunications), which cover events such as internal breakdowns, sabotage, and accidental damage
  • Environmental problems, which usually cover hurricanes, tornadoes, floods, fires, terrorist attack, and similar events that affect more than just IT assets

The next step: For each of these categories, draw up a list of computing and telecommunications equipment (starting with servers), and assign risk factors and a recovery priority to each piece.

After going through the hardware, a corresponding list of software (applications) and data should be crafted within the same framework. When complete, all of this should provide a reasonably good picture of an enterprise's vulnerability and suggest approaches for recovering from a loss of equipment, applications, and data.

Developing a Recovery Plan

At the heart of most recovery plans for server operations is physical separation. At a minimum, data, and often servers, are maintained offsite (i.e., physically separated from the enterprise's facilities so that any local disaster will not affect them).

The place of storage is usually called the recovery site (or sites), and there are several types.

  • Hot Site: Servers, applications, and data are maintained in real-time synchronization (mirrored) with the main IT operations. Disaster recovery is (hopefully) seamless and immediate. Since this usually means duplicating hardware and software, the approach is typically very expensive.
  • Cold Site: Data, applications, and often servers are maintained on a standby basis with regular updates or synchronization. These systems must be brought up to speed and put online for disaster recovery, so response times are usually measured in hours or even days. This approach can be relatively expensive, although (depending on the equipment required) it is less expensive than a hot site.
  • Web Site (Vault) or Other Offsite Data Storage: This does not involve duplicate hardware and is therefore much less expensive; however, recovery depends on re-creating damaged hardware environments, and this may take considerable time.
  • Reciprocal Site: Occasionally an agreement or contract can be worked out with a "friendly" enterprise to share servers and backup. This approach can be very inexpensive but brings with it numerous security and reliability drawbacks.

Selecting an appropriate type of site and developing the logistics, policies, and procedures for it is the most critical aspect of disaster recovery. At this point, employees, management, and partners are usually closely involved in discussing how the plan will be implemented.

In-House or Hosted?

Because of the expense of re-creating hardware and software environments for recovery plans, most enterprises must consider the advantages and disadvantages of operating their own recovery site or using third-party services -- i.e., a hosted recovery site.

Thanks to the Internet, many application service providers specialize in disaster recovery and offer a wide variety of services and prices. On the other hand, security and operational issues may make working with outside companies difficult or even impossible. This is another area where the enterprise's business practices and recovery requirements may override cost factors.

Implementing and Maintaining the Plan

Implementing a disaster recovery plan is not an typical project. For one thing, the plan must cover more than just the IT concerns of physical protection for servers, data, and communications. If things go wrong -- very wrong -- the effect on people and business processes will be equally profound. So the entire company, from executives to general employees, must be included, and often trained, in their part of a recovery plan.

When it comes to the company's servers, no plan can be allowed to sit in the wiring closet for months (or years) without running the risk of becoming obsolete. Not only must the physical aspects of the plan (e.g., links, storage, and synchronization) be periodically tested, but also the procedures and even the assumptions of the recovery should be regularly updated.

Disaster recovery for servers and related equipment, as well as IT operations in general, must be seen as part of an enterprisewide recovery plan. Developing such a plan can be a major undertaking. Recent current events, however, demonstrate that now is a good time to find general background information and inexpensive advice from resources such as the Web and various publications.

Quick Resources:
Disaster Recovery Journal www.drj.com
United States Disaster Preparedness Council www.usdpc.org
Disaster Recovery Institute International www.dr.org
Disaster Recovery Information Exchange (Canada) www.drie.org