Replacing a PDC and BDC by a NT-cluster in a exsisting Domain
To make high availability possible for one of our customers, we made the choice to implement a cluster server based on Microsoft cluster server (MSCS) and Compaq hardware. It was also a test case for the implementation of a cluster based on Digital hardware, which was implemented later the same year. The migration was a very straight on migration expect for the fact that this cluster would replace the primary (PDC) and the backup domain controller (BDC) of their domain. So we had to make sure that there would be no interference with the existing logon procedures and batch programs. Implementing two virtual servers gave us the tools to do the implementation without any big problems.
To make high availability possible for one of our customers, we made the choice to implement a cluster server based on Microsoft cluster server (MSCS) and Compaq hardware. It was also a test case for the implementation of a cluster based on Digital hardware, which was implemented later the same year. The migration was a very straight on migration expect for the fact that this cluster would replace the primary - (PDC) and the backup domain controller (BDC) of their domain. So we had to make sure that there would be no interference with the existing logon procedures and batch programs. Implementing two virtual servers gave us the tools to do the implementation without any big problems.
The customer had a PDC and BDC running in their Master domain and both machines where running at the top level of their capacity and they had problems with uptimes of the both systems. By migration from two separate systems into one cluster server there where two goals to achieve;
- Upgrade the hardware
- Implement the cluster server for higher availability
Because this site was a site in production we had to be very careful that there was no disruption of the existing situation, which mend that there was a user database and there was user data stored on the systems. Also the major condition was that the users might not encounter any problems due the migration, so we had to this over the weekend.
The major advantage was that we could keep the names and IP-addresses of the two existing servers and couple them to the virtual servers of the cluster.
Before we started the migration for two separated systems (PDC and BDC) to a single cluster server, we did a site survey. We checked the following things:
folders on both systems are shared (hidden or not) and how are the offered to
applications are installed on both systems, are they cluster aware and if not
which had to be cluster aware or how mission critical are those applications?
services are running on both systems and also here, are they cluster aware and
if not how can we turn them into cluster aware applications?
are several services configured?
After the site check we had the following data from both systems:
- Both servers were running
DHCP/WINS. These services are not cluster aware but we could set them up as a
fault tolerant service.
- There were no mission
critical applications running on both servers, so the user data had our greatest
- Most of the user data was stored on the BDC and most of the applications were stored on the PDC.
With this information at hand we started to develop a strategy for the migration.
Strategy to follow:
We faced the challenge and came up with the follow strategy for this implementation:
and test the hardware "in-house". Which mend that we build up the hardware and
configured the hardware and installed the both servers in our company and that
the customer checked the installation.
the hardware to the customer and start to rebuild the entire cluster so that the
two servers would integrate in their Domain. The two servers became BDC's in the
- Disconnect the two old
servers from the Domain. Demote the existing PDC to BDC and afterwards promote
one of the two nodes to PDC, so we could keep the existing user database.
up the virtual servers and set up the virtual resources.
- Restore the backup to the cluster.
It looks very easy but the major risk in this challenge was the backup of the user data. If the backup we made the night before wasn't good, the whole operation was canceled and should be rescheduled for another weekend. As the customers demand was that there should be no interference for the users, they logged out on Friday and started again to work at Monday as if there have been no migration.
The rules lined out demanded for a strategy, which gave us the change to back out anytime we wanted and go back to the old situation without any serious problems. Also we had to be sure that when we copied the user database that this was the most recent one.
Communication between our customer and us was very essential and had to be very direct to achieve the wanted results.
Step 1; Building the hardware:
We offered the following hardware solution: the Compaq Proliant Cluster Series S Model 100. This hardware consists of two Compaq Proliant 3000 Systems, single PII 450 MHz and 521 MB on board. For shared SCSI device we used two Compaq Proliant Storage systems with each 9.4 GB disks in a RAID 5 configuration and replaced the original controller with a Compaq recovery server option (RSO) board, needed for a cluster configuration. For the internal RAID 1 configuration and for each Compaq Storage system we used three Compaq Smart 2 DH controllers for each Proliant system.
Using this configuration made manual load balancing possible. Figure 1 shows the configuration of the two systems and their storage.
We build up the hardware and start the installation with the Compaq SmartStart CD version 4.21. Using the SmartStart CD makes setting up the Compaq hardware pretty straight forward. After setting up the hardware, at this time only the internal RAID was build, we start to install NT enterprise Edition. Because we used the Microsoft Select CD, which is not bootable, we used the three setup disks from NT. In this process, the only challenge we encountered was choosing the right drivers for the NIC. NT recognized the Intel chipset but after we noticed that installing this driver gave later on problems. After a little search we installed the Compaq 3120 driver and this one gave the results we were looking for.
After installing NT we installed the Compaq Support Software and Servicepack 3 (SP3) for NT from the NT Enterprise CD, .
After setting up the both systems we started to configure the RAID configuration for the Proliant storage. This was done using the Compaq Array Configuration Utility (ACU) from Compaq Support Software. To make sure that we could install the MSCS we gave the disks from the Proliant storage system the same drive letters by using the Disk Administrator. After all the disks had the same drive letters we started the installation of MSCS. By building a cluster on the Compaq cluster Series S model 100 the installation of the cluster began installing the driver for the Compaq hardware. The setup wizard started with searching the drives used for the shared SCSI device. After that setup wizard has installed the software for the hardware, the wizard asked for the NT enterprise CD 2, which contains the cluster software. From here the setup of the cluster is very straight on. We used the following naming convention for the cluster group and nodes :
- Cluster group : Keersop name of a small river in the Netherlands
- Node 1 : KeersopN1
- Node 2 : KeersopN2
This was according the naming convention used by the customer. Because we wanted to keep the IP-addresses from the PDC and BDC, we used new ones for the Cluster group and the cluster nodes. For the interconnect of the cluster (the connection for the heartbeat) we used two addresses in the private range Class A 10.0.100.1 and 10.0.100.2 subnetmask 255.0.0.0 to make sure that they would not interfere with the existing IP-range.
After the hardware was setup we configured two virtual servers on the nodes:
- Dieze which was the PDC with the same IP-address in the old situation
- Beerze which was the BDC also with same IP-address in the old situation
The customer paid us a visit and we gave a demonstration that every thing worked by simulating several fail-over's. After this we broke down the configuration and made it ready for transport.
Step 2; Building the hardware on site:
The hardware was transported to the site and we check the backup log to make sure that this was gone weel. It was, so there where no problems to start the operation.
We started the operation whit building the hardware in 19-inch racks. After this was done we connected the two servers to the existing network We started to install the two servers with deleting the existing configuration whit Compaq configuration delete utility. After this step we started to rebuild the two servers with installing NT 4 Enterprise edition as two BDC's in the existing Domain. This was done the same way we build the two servers in house. It showed that a good preparation is half the work. The User database came over well and after the installation there where two extra BDCs in the Domain. The next step arrived.
Step 3; Disconnecting and promote a BDC:
The next step in the operation was disconnecting the two old servers form the network. We kept both servers apart, they where one of our back up possibilities, if the new servers gave unexpected errors. The next step was to promote one of the node to PDC so that we were able to make changes the User Database. This delivered no problems so the new situation became very close to the old one. We checked if it was possible to make changes in the User Database as test for its functionality.
On the old servers were several services running, with DHCP/WINS as main services. We configured the DHCP servers manually in the same way the old service was configured. For configuring the WINS service we copied the WINS database from the old server into the new one, according article Q1722153 from the MS Technet/knownlegde base. This gave no further problems.
After this test and the services set up, we decided to continue the operation, so far, so good.
Step 4; setting up the cluster and its virtual servers and resources:
The installation of MSCS was a very straightforward process. We installed MSCS and named the cluster group Keersop. Because we had done a in house installation we didn't met any serious problems and within the hour we had the cluster server up and running. As test we performed several fail over's, even the toughest one, pulling the plug out, and every time the cluster came up as we liked to come.
After this hour we started to build the virtual clusters on the nodes. Because we liked to take advantage from the manual load balancing we configured the two virtual servers with each a node as preferred owner. The following configuration was build by us:
- Virtual server Dieze with preferred owner KeersopN1
- Virtual server Beerze with preferred owner KeersopN2
Both nodes where able to take the other one over in case of a failure. For the IP-addresses we used the IP-addresses they had in the old situation.
After the virtual severs where created and configured we installed SP4 for NT because this service pack gave us the tools to make very easily virtual shares, which was the following step in the operation.
Creating the virtual shares on the Dieze was a lot of work because there where so many. On Beerze it was very easy because it only contained the user data and we only to create the user data share and take the advantage of sharing directories below this level with one click on the advanced button in the tab parameters from the wizard.
After creating the virtual we installed the backup program and restored one user and applications and their data to test the functionality. The test was trying to logon to the domain and see if the user had the full functionality.
We were very glad when we saw that there were no big problems with the cluster server. The user could logon, got the proper scripts and all his applications.
We took off for a dinner and decided that we should restore the backup overnight and then come back Sunday to check if there werent any problems with restoring the backup.
The first thing we did on Sunday morning was checking the log of the backup if there were some strange things happened during the restore. There weren't happened so we started we the final step, setting the proper rights. After finishing this step the work was done and we could go home with a good feeling.
The day after:
Monday was a very exciting day, because this was the first day of product with the newly build cluster. There were no big performance problems so everybody was very satisfied about the result. We faced several minor WINS problems and solved them on the fly.
I like to thank everybody who paid a contribution in this project especially; the guys on the site Jos and Arno.
And Stephan Kloots, Wibo Muljono, Ben Cats and Dick v.d. Linden for the meaningful discussions during the preparation phase of this project