This is the first article of our new series about high availability solutions based on Windows Server 2003. Although some of the technologies we will be describing here have been available in earlier version of Windows (as inherent components, add-on programs, or third-party offerings that made their way into Microsoft’s portfolio through acquisitions), their latest incarnations are superior from functionality, stability, and manageability perspectives.
We kick off a new series dedicated to Windows Server 2003 high availability solutions with a look at server clustering.
Two basic approaches to reaching high availability have been built into the operating system. The first, known as Server Clustering, requires Windows Server 2003 Enterprise and Datacenter Editions. The second one, known as Network Load Balancing (NLB), was incorporated into all Windows Server 2003 versions (including Standard and Web).
Each represents a unique approach to eliminating “a single point of failure” in computer system design. They also share one important underlying feature that serves as the basis for their redundancy: Both increase availability by relying on multiple physical servers, hosting identically configured instances of a particular resource (such as a service or application). The main difference lies in the way these instances are defined and implemented.
With Server Clustering, there is only a single active instance for each highly available resource, regardless of the total number of servers that are members of the entire cluster. The server that currently hosts this resource becomes its owner and is responsible for processing all requests for its services.
In case of NLB, each instance is permanently tied to the hosting of its physical server, and it remains active as long as this server is functional. In other words, all of them operate simultaneously during cluster uptime. With Server Clustering, on the other hand, there is only a single active instance for each highly available resource, regardless of the total number of servers that are members of the entire cluster. The server that currently hosts this resource becomes its owner and is responsible for processing all requests for its services.
These underlying architectural principles introduce a number of challenges. Since the NLB cluster consists of up to 32 instances running in parallel servers, there is a need for additional mechanisms that enable them to decide which is responsible for handling the processing of client requests targeting any of the highly available resources at any given time. This determination must be made for every new incoming request and, depending on the configuration, might have to be performed independently for each of them.
With Server Clustering, the equivalent process is trivial since there is only one instance of each highly available resource. The cost, however, is increased complexity of the logic that dictates which member server hosts this resource, especially following the failures of its previous owner.
This article will describe this logic, along with the details of its implementation. A future article will discuss NLB technology in a similar manner.
To function as a unit, servers participating in a cluster (also referred to as nodes) must be able to interact with each other. This is accomplished by setting up redundant network connections so as to the minimize the possibility of failure. Thus, each node should have at least two network adapters. The connections are organized into two groups — private and public, also referred to as “Internal Cluster communications only” and “All Communications,” respectively. They are identified and configured during cluster installation on each member server.
The first one contains links dedicated to internode, intracluster traffic. Although the primary purpose of the second one is to carry service requests and responses between clients and the cluster, it also serves as a backup to the first one. Depending on the number of nodes in a cluster (and your budget), you can employ different technologies to implement nodes interconnects. In the simplest case (limited to two nodes), this is possible with a crossover cable. When a larger number of servers participate in a cluster (up to total of eight supported by Windows Server 2003 Enterprise and Datacenter Editions) a, preferably dedicated, hub or a switch is needed.
To optimize internode communication, which is critical for a cluster to operate properly, we recommended eliminating any unnecessary network traffic on the private network interfaces. This is accomplished by:
- Disabling NetBIOS over TCP/IP Relevant options are listed in the NetBIOS section on the WINS tab of the Advanced TCP/IP settings dialog box of the interface properties
- Removing file and printer sharing for Microsoft Networks — Configurable on the General tab of the interface properties dialog box
- Setting appropriate speed and duplexity mode, rather than relying on Autodetect option — Done from the Advanced tab of the network adapter Properties dialog box
- Ensure that statically assigned IP addresses are used — Instead of using Dynamic Host Configuration Protocol or Automatic Private IP Addressing
- The should be no default gateway, and entries should be cleared for the “Use the following DNS server addresses” options Present on the Internet Protocol Properties dialog box for the connection
On the Windows Server 2003, it is no longer necessary to disable the Media Sensing feature. This was accomplished by registry modification on Windows 2000-based cluster members.
Despite these extra measures, communication between nodes can still fail. This makes it necessary to provide an additional safety mechanism that would prevent a so-called “split-brain” scenario where individual nodes, unable to determine status of clustered resources, attempt to activate them at the same time. This would violate the principles of server clustering described above and result in potentially serious implications, such as data corruption in case of disk-based resources.
To prevent this, every cluster contains one designated resource, called Quorum, implemented as a dedicated disk volume. Most frequently, this volume consists of a pair of mirrored disks, which increases the level of its fault tolerance. Its optimum size is 500 MB (due to NTFS characteristics), although its use typically constitutes only a fraction of this capacity. Like with other resources, only one server owns the Quorum at any given time. The Quorum owner has the ultimate responsibility for making decisions regarding ownership of all other resources.
Every cluster contains one designated resource, called Quorum, implemented as a dedicated disk volume. Most frequently, this volume consists of a pair of mirrored disks, which increases level of its fault tolerance.
More specifically, nodes exchange “heartbeat” signals, formatted as User Datagram Protocol (UDP) packets at pre-configured intervals (every 1.2 seconds) to confirm their network interfaces are operational. The absence of two consecutive packets triggers a reaction that is supposed to address potential cluster problems. In Windows 2000 Server based implementations, this consisted of activating all resources on the current owner of the Quorum and, simultaneously, deactivating them on all other nodes. This effectively ensured only a single instance of each resource remained online. However, under certain circumstances, it could lead to an undesirable outcome.
Although a rather rare occurrence, it is possible for the Quorum owner to lose connectivity on all of its interfaces and, at the same time, the remaining nodes remain able to communicate with the client’s network. As the result, user requests will not be able to reach cluster resources, which are still active but reside on the node that is no longer accessible. Remaining nodes, however, would be fully capable of handling these requests, if they can take ownership of the Quorum and all other resources.
The introduction of additional logic in the way Windows Server 2003 based clusters handle the absence of heartbeat traffic resolved this issue. Rather than following the legacy procedure when missing heartbeat signals are detected, nodes first check whether any of their network interfaces designated as public are operational and, if so, whether client networks are still reachable. This is accomplished by sending ICMP (Internet Control Message Protocol) echo requests (i.e., executing PING) to external systems — typically the default gateway configured for these interfaces. If the node hosting the Quorum fails any of these tests, it will voluntarily deactivate all its resources, including the Quorum. If the remaining nodes discover their network links are still working, they will have no problem establishing a new Quorum owner and transfer control of all cluster resources to it.
Besides assisting with resource arbitration following communication failure, Quorum serves another important function — providing storage for up-to-date cluster configuration.
Besides assisting with resource arbitration following communication failure, Quorum serves another important function — providing storage for up-to-date cluster configuration. This configuration resides in two files in the MSCS folder on the quorum volume — the cluster hive checkpoint file (Chkxxx.tmp) and Quorum log (Quolog.log). The first one stores a copy of configuration database, which mirrors the content of the Cluster registry hive on the server hosting the Quorum resource and stored in the %SystemRoot%ClusterCLUSDB file on that server. This database is replicated to all remaining nodes and loaded into their Registry (maintaining single “master” copy of this information ensures its consistency). Replication takes place for every new cluster configuration change, as long as all nodes are operational. If this is not the case, timestamped changes are recorded in the Quorum log file and applied to configuration database once the offline nodes are brought back online. Being familiar with these facts is important when troubleshooting some of the most severe cluster problems.
As already mentioned, Quorum is implemented as a volume on a physical disk. However, details of this implementation vary depending on a number of factors, such as number of nodes, server cluster type, or storage technology. We will discuss these factors as well as continue our coverage of principles of Server Clustering in the next article in this series.