by Marcin Policht
Make sure that your planned cluster configuration is listed on Microsoft HCL.
Verify that entire cluster, not just its components, are supported by Microsoft.
for the latest version of HCL.
New Windows NT/2000 columnist Marcin Policht gives you some very practical advice on Windows NT 4.0 and Windows 2000 clustering. If you are planning a cluster or already running a cluster, you don’t want to miss this!
Cluster Service (ClusSvc) needs to be running using a domain account which is a member of
each node’s local Administrators group (directly, not via a global group), and needs
“Logon as a service” and “Lock pages in memory” user rights. Make sure the password
for the account does not expire.
Set the boot delay time to different values on both servers (5s and 30s).
Do not turn both servers on simultaneously while connected to shared storage unit until the Cluster Service is running on at least one of them.
Ensure reliable connectivity to a domain controller or set both servers as
backup domain controllers.
Change the Cluster log size from 64kB (default) to 128kB.
Do not configure immediate automatic failback. Set the failback to occur during off-peak hours (e.g. between 21 and 6 or run it
manually, if needed).
It is recommended to set the quorum drive to reside on its own disk resource
(pick a small, mirrored drives, if possible – quorum uses no more than a few
Make sure that the cluster name group only contains cluster IP address and cluster name.
Set all advanced resource settings “Looks Alive” and “Is Alive” to reference “Use value from resource type”.
If running NT 4.0 EE with SP5, Cluster Administrator should be run from machines with SP5 (or running SP5 version of
Limit on total number of resources is 1,600 (starting with SP4).
Network Configuration Tips
The only supported configuration includes private interconnect for cluster only communication and public one used for both client to cluster and cluster to cluster communication (this requires at least 2 NICs on each node). Configuration with no private network for cluster communication is not supported.
Do not use DHCP assigned addresses for any of the Cluster networking interfaces.
Set all NICs to a specific speed (DO NOT use autodetect) and specify appropriate duplex settings.
For the private interconnect, in Windows NT 4.0, unbind the WINS and do not set the default gateway. Also disable the Server, Workstation, and NetBIOS interface bindings for the interconnect NIC.
In Windows 2000, disable NetBIOS over TCP/IP on the Advanced WINS tab in TCP/IP properties.
Cluster Service uses Windows sockets with RPC, not NetBIOS, for internal communication.
To optimize the response on the public network, set the public interface adapter higher in the
order of TCP/IP bindings.
Subnet mask for the heartbeat and client networks should be the same
The private interconnect should be using a crossover cable or an isolated hub.
In Windows NT 4.0 EE, failure of the public connection of a node which owns a resource will
NOT cause the failover to the other node of the cluster group containing the resource. This is a more general problem which unfortunately is
still not resolved in MSCS for NT4.0 EE. Windows 2000 is using its plug and play capabilities to detect disconnected network cables and connectivity
problems which allows the cluster to properly fail over. This is done by extending connectivity testing beyond simple heartbeat
(Cluster service communicates between nodes by sending a heartbeat signal – a
single UDP packet – every 1.2 second to confirm connectivity) and running ping to an external host on the same subnet (typically the local gateway).
In case of lack of conclusive information based on the heartbeat, the decision about the failover is
depends on which node receives ICMP echo reply.
Installing File Share Resource Tips
Do not create file shares using NT Explorer or Server Manager but use Cluster Administrator instead.
When setting up File Shares, make sure that they do not “Affect the Group”.
For subdirectories sharing, SP4 or later has to be installed, however SP5 is required for their dynamic discovery.
When assigning shared permissions to the resource, use Cluster Administrator interface rather than Explorer or Server Manager. Make sure that the Cluster Service account has Full Control on the share and NTFS level.
Installing Print Spooler Resource
Print spooler resource depends on Physical Disk, and Network Name which in turn depends on IP Address resource.
Print resource needs to be configured by:
– Installing ports on both nodes (printer ports must have the same name on both nodes)
– Installing printer drivers on both nodes (after the installation, printers can be deleted)
– Running Add Printer wizard over the network
Testing and Troubleshooting
In Windows NT 4.0, set Cluster Logging by setting up system environment variable %ClusterLog% which points to the location and the name of the file containing the log e.g. %WINDIR%clustercluster.log. You can set the level of logging by using another system environment variable %ClusterLogLevel%, and assign a value between 0 (no
log) and 3 (full log).
The maximum size is by default 8MB, after the log reaches this size, it circularly overwrites
the oldest entries in 64 kB increments. %CLUSTERLOGSIZE% system environment variable allows you to set the size of the log in MB. You need to restart the computer in order for the changes to take place.
In Windows 2000, logging is enabled by default in the %SystemRoot%ClusterCluster.log. The logs are more
“reader” friendly and contain references to resource and group names, rather than GUIDs
(which NT 4.0 refers to).
When running into problems while connecting to the cluster via Cluster Administrator, connect to the node name rather than to the cluster name.
If Clusdb file (containing backup of cluster registry hive) gets corrupted, it can be restored using its copy called Chkxxxx.tmp
(“xxxx” changes) located in MSCS folder on the quorum drive. This file can be simply copied (after stopping cluster service on both nodes) over damaged Clusdb file in the WinntCluster folder on one of the nodes. Once the node starts successfully, copy
tmp files from MSCS folder to the other node and restart it.
Before you initiate a shutdown of a node, move all groups in Cluster Administrator to
the other one. You can also use a batch file in which you first stop the Cluster service (with the
“net stop clussvc” command), and then use Shutdown.exe from the Resource Kit to shut down the node.
If this procedure is not followed, you might receive the following event log
entry during the shutdown of the cluster node that owns a resource:
System Process – Lost Delayed-Write Data
The system was attempting to transfer file data from buffers to DeviceHarddisk#Partition#. The write operation failed, and only some of the data may have been written to the file.
This happens because Cluster service during shutdown of one node stops the network heartbeat to the other node (by design). This,
in turn, initiates the failover to the surviving node, but if the first node still writes data to a disk resource which it owns, the message is generated and the data corruption can occur.
Replacing Failed Disk in a Shared Disk Resource
The Cluster Service may not start with the Event ID:1034 due to its dependency on disk signatures in identifying and mounting volumes. Refer to
http://support.microsoft.com/support/kb/articles/q217/2/24.asp for the
The installation of service packs should be done in the same fashion on both nodes, after the proper backup, of course. Prior to
installation on a node, all resource groups should be moved to the other one.
All non-critical services, including the Cluster service should be stopped(once the failover completes).
The same process should be followed on the other node.