Win 2003 High Availability Solutions, SAN-based Storage
August 17, 2006
In the previous article in our series on Windows 2003 Server High Availability solutions, we reviewed the SCSI architecture, which has been providing shared storage capabilities since the earliest Microsoft server cluster deployments. Although this approach is still available and frequently used in lower-end, two-node clustering implementations (since higher number of nodes is not supported with this hardware platform), its popularity has declined, in part due to the introduction of other, considerably more efficient, flexible, stable, scalable and secure options.
The undisputed lead in this area now belongs to the Fibre Channel storage area network (FC SANs) solutions (although, iSCSI and Network Attached Storage are quickly catching up), which this article will cover.
FC SANs represent a considerable shift from the directly attached storage paradigm. They offer significant functionality and performance improvements. The basic idea is to use a network infrastructure for connecting servers to their disks, allowing physical separation of the two by far greater distances than was previously possible. But there are also other, equally important advantages of this separation. Managing storage in larger environments no longer requires dealing with each individual system, as was the case with directly attached models. Disks are grouped together, simplifying their administration (e.g., monitoring, backups, restores, provisioning and expansion) and making it more efficient, through such inventions as LAN-free or server-free backups and restores, or booting from a SAN.
In addition, since large number of servers and storage devices can participate in the same SAN, it is possible to attach new ones as needed, making allocation of additional space a fairly easy task. This is further simplified by the DISKPART.EXE Windows 2003 Server utility, which is capable of dynamically extending basic and dynamic volumes, as explained in Microsoft Knowledge Base Article Q325590. This is especially true when comparing the SAN with a SCSI-based setup, where the limited amount of internal or external connectors and adjacent physical space available must be taken into account.
Fibre Channel SAN technology leverages SCSI-3 specifications for communication between hosts and target devices, since its implementation is based on the SCSI command set. Their transmission, however, is handled using FC transport protocol. This is done in a serial manner, typically over fiber optic cabling (although copper-based media are allowed), which eliminates distance limitations inherent to parallel SCSI.
Note, however, that the term "network" should not be interpreted in the traditional sense, since SANs do not offer routing capabilities, primarily because they are intended for high-speed, low-latency communication. SANs also use a distinct end node identification mechanism, which does not rely on Media Access Control (MAC) addresses associated with each network adapter but instead employs 64-bit (expressed usually in the form of eight pairs of hexadecimal characters) World Wide Names (WWN), burned into fibre host bus adapters (HBAs) by their manufacturers. FC interconnecting devices handle dynamic address allocation on the fabric level. In addition, unlike majority of IP-based networks, FC SANs have primarily asymmetric characters, with active servers on one end connecting mostly to passive devices, such as disks arrays or tape drives on the other, arranged in one of the following topologies:
Increased performance, flexibility and the reliability of switched implementations come with their own set of drawbacks. Besides considerably higher cost (compared to arbitrated loops) and interoperability issues across components from different vendors, one of the most significant ones is the increased complexity of configuration and management. In particular, it is frequently necessary to provide an appropriate degree of isolation across multiple hosts connected to the same fabric and shared devices with which they are supposed to interact.
As mentioned earlier, this exclusive access is required to avoid data corruption, which is bound to happen with unarbitrated, simultaneous writes to the same disk volume. In general, three mechanisms deliver this functionality zoning, LUN masking (known also as selective presentation), and multipath configurations.
Zoning can be compared to Virtual LANs (VLANs) in traditional networks, since it defines logical boundaries (known in SAN terminology as zones) that encompass arbitrarily designated switch ports. Zones definitions in clustered deployments are typically stored and enforced by the switch port ASIC (Application-Specific Integrated Circuits) firmware, with communication permitted only between nodes attached to the switch ports that belong to the same zone. They can also be implemented by referencing WWN of host bus adapters. In addition to preventing accidental data corruption, zoning offers also an additional level of security. It protects server from unauthorized access. In clustered configurations, cluster nodes, along with the shared disks that constitute clustered resources, should belong to the same zone.
LUN (an acronym for Logical Unit Number, describing a logical disk defined in a FC SAN) masking makes it possible to limit access to individual, arbitrarily selected LUNs within a shared storage device. Such functionality is typically required in configurations involving large multidisk systems, where port-level zoning does not offer sufficient granularity. LUN masking provides necessary isolation in cases of overlapping zones, where hosts or storage devices belong to more than one zone. The relevant configuration is performed and stored on the storage controller level.
Multipath technology is the direct result of the strive for full redundancy in SAN environment. Such redundancy is available on the storage side (through fault-tolerant disk configurations, dual controllers with their own dedicated battery-backed caches and power supplies) and on the server side (through server clustering, with each of the member servers featuring dual, hot-swappable components). It is reasonable to expect the same when it comes to SAN connectivity.
Unfortunately, the solution is not as simple as installing two FC host bus adapters (HBAs) and connecting them to two, redundant switches, each of which, in turn, attaches to separate FC connections on the storage controller. This is because without additional provisions, Windows would detect two distinct I/O buses and separately enumerate devices connected to each (resulting in a duplicate set of drives presented to the operating system), which could potentially lead to data corruption. To resolve this issue, Microsoft Windows 2003 Server includes native support for Multipath I/O, which makes it possible to connect dual HBAs to the same target storage device with support for failover, failback and load balancing functionality.
Each implementation of a Windows 2003 Server server cluster must belong to a dedicated zone, to eliminate potential adverse effect of the disk access protection mechanism included in the clustering software on other devices. This does not apply, however, to storage controllers, which can be shared across multiple zones, as long as they are included on the Cluster/Multi-Cluster Device HCL. In addition, you should avoid collocating disk and tape devices in the same zone, as the SCSI bus reset commands can interfere with normal tape operations.
Remember, the rule regarding consistent hardware and software setup across all cluster nodes extends to SAN connections including host bus adapter models, their firmware revision levels and driver versions.
You should also ensure that automatic basic disk volume mounting feature is disabled. This does not apply to volumes residing on dynamic disks or removable media, which are always automatically mounted. Earlier versions of Windows would spontaneously mount every newly detected volume. In a SAN environment, this could create a problem if zoning or LUN masking was misconfigured or if prospective cluster nodes had access to the shared LUNs prior to installation of the clustering software. This feature is configurable, and disabled by default, in Windows 2003 Server. It can be controlled by running the MOUNTVOL command or using AUTOMOUNT option of the DISKPART utility.