Win 2003 High Availability Solutions, iSCSI Storage
As part of our series covering Windows 2003 High Availability Solutions, we most recently focused on storage techniques that can be incorporated into Server Clustering. So far, we discussed the two most common choices direct-attached SCSI (with popularity resulting from its long-lasting, wide-spread commercial presence and low pricing) and Fibre Channel storage-area networks (FC SANs), which are frequently chosen because of their superior performance and reliability.
Unfortunately, the cost associated with FC SAN deployments is prohibitive for most smaller or less-critical environments, whose requirements cannot be satisfied with parallel SCSI because of its performance and scalability limitations. The introduction of iSCSI resolves this dilemma by combining the benefits of both technologies and at the same time avoiding their biggest drawbacks. This article will overview of the general principles of iSCSI storage and describe its clustering characteristics on the Windows 2003 Server platform.
iSCSI is an acronym derived from the term Internet SCSI, which succinctly summarizes its basic premise. iSCSI uses IP packets to carry SCSI commands, status signals, and data between storage devices and hosts over standard networks. This approach offers tremendous advantage by leveraging existing hardware and cabling (as well as expertise). Although iSCSI frequently uses Gigabit Ethernet, with enterprise class switches and specialized network adapters (containing firmware that processes iSCSI-related traffic, offloading it from host CPUs), its overall cost is lower than equivalent Fibre Channel deployments. At the same rate, however, features, such as addressing or automatic device discovery built into FC SAN infrastructure, must be incorporated into iSCSI specifications and implemented in its components.
iSCSI communication is carried over a TCP session between an iSCSI initiator (for which functionality is provided in Windows 2003 in the form of software or a mix of HBA firmware and Storport miniport driver) and an iSCSI target (such as a storage device), established following a logon sequence, during which session security and transport parameters are negotiated. These sessions can be made persistent so they are automatically restored after host reboots.
On the network level, both initiator and target get assigned unique IP addresses, which allow for node identification. With node identification, the target is actually accessed by a combination of IP address and port number, which is referred to as portal. In the iSCSI protocol, addressing is typically handled with iSCSI Qualified Name (IQN) convention. Its format consists of the type identifier (i.e., "iqn."), registration date field (in the month-year notation) followed by the period and domain in which the name is registered (in reversed sequence), the semicolon, and the host (or device) name, which can be either autogenerated (as is the case with Microsoft implementation, where it is derived from the computer name), preassigned, or chosen arbitrarily, serving as a descriptor providing such information as device model, location, purpose, or LUN.
Targets are located either by statically configuring software initiator, by specifying target portal parameters (and corresponding logon credentials), by leveraging functionality built into HBAs on the host, or discovered automatically, using information stored on an Internet Storage Name Server (iSNS). This server offers a centralized database of iSCSI resources, where iSCSI storage devices are able to register parameters and status, which subsequently can be referenced by initiators. Access to individual records can be restricted based on discovery domains, serving a purpose similar to FC SAN zoning.
In a typical Microsoft iSCSI implementation, the initiator software (currently in version 2.02, downloadable from the Microsoft Web site and supported in Windows 2000 SP3 or later, Windows XP Professional SP1 or later, and Windows 2003 Server) running on a Windows host server (with a compatible NIC or an HBA that supports Microsoft iSCSI driver interface), is used to mount storage volumes located on iSCSI targets and registered with iSNS server (which Microsoft implementation currently in version 3.0 supported on Windows 2000 Server SP4 and Windows 2003 Server is also available as a free download).
Installation of the initiator includes iSNS client and administrative features, in the form of iSCSI Initiator applet in the Control Panel and Windows Management Instrumentation and iSCSI Command Line interface (iSCSICLI). The software-based initiator lacks some of the functionality that might be available with hardware-based solutions (such as support for dynamic volumes or booting from iSCSI disks).
To provide a sufficient level of security and segregation, consider isolating iSCSI infrastructure to a dedicated storage network (or separating the shared environment with VLANs), as well as applying authentication and encryption methods. With Microsoft implementation, authentication (as well as segregation of storage) is handled with Challenge Handshake Authentication Protocol (CHAP), relying on a password shared between an initiator and a target, providing that the latter supports it. Communication can be encrypted directly on end devices, using built-in features of high-end iSCSI HBAs, third-party encryption methods, or Microsoft's version of IPSec.
Although network teaming is not supported on iSCSI interfaces, it is possible to enable communication between an initiator and a target via redundant network paths that accommodates setup with multiple local NICs or HBAs and separate interconnects for each. This can be done by implementing multiple connections per session (MCS), which leverage a single iSCSI session, or with Microsoft Multipath I/O (MPIO), which creates multiple sessions. The distribution of I/O across connections (applied to all LUNs involved in the same session) or sessions (referencing individual LUNs), for MSC and MPIO, (respectively), depends on Load Balance Policies configured by assigning Active or Passive type to each of network paths. This results in one of the following arrangements:
- Fail Over Only uses a single active path as the primary and treats all others as secondaries, which are attempted in round-robin fashion in case the primary fails. The first available one found becomes the primary.
- Round Robin distributes iSCSI communication evenly to all paths in round-robin fashion.
- Round Robin with Subset functions with one set of paths in the Active mode and the other remaining Passive. The traffic is distributed according to the round robin algorithm across all active paths.
- Weighted Path selects a single active path by picking the lowest value of arbitrarily assigned weight parameter.
- Least Queue Depth, available only with MCS, sends traffic to the path with the fewest number of requests.
The multipathing solution selected depends on a number of factors, such as support on the target side, required level of granularity of Load Balance Policy (individual LUN or session level), and hardware components (MCS is recommended in cases where a software-based initiator without presence of specialized HBAs on the host side is used). Regardless of your decision, take advantage of this functionality as part of your clustering deployment to increase the level of redundancy.
When incorporating iSCSI storage into your Windows 2003 Server cluster implementation (note that Microsoft does not support it on Windows 2000), also ensure that components on the host side fully comply with iSCSI device logo program specifications and basic clustering principles. Take into account information presented in earlier articles of this series as well as domain and network dependencies. Also bear in mind that besides SCSI RESERVE and RELEASE commands (which provide basic functionality), iSCSI targets must support SCSI PERSISTENT RESERVE and PERSISTENT RELEASE to allow for all of the Load Balance policies and persistent logons.
The latter requires a persistent reservation key be configured on all cluster nodes. This is done by choosing an arbitrary 8-byte value, with the first 6 bytes unique to each cluster and the remaining 2 bytes varying between its nodes. Data is entered in the PersistentReservationKey REG_BINARY entry of the HKLMSystemCurrentControlSetServicesMSiSCDSMPersistentReservation registry key on each cluster member. In addition, the UsePersistentReservation entry of REG_DWORD type is set to 1 in the same registry location. You should also enable Bind Volumes Initiator Setting (in the Properties dialog box of the iSCSI Initiator Control Panel applet), which ensures all iSCSI hosted volumes are mounted before the Cluster Service attempts to bring them online.
To avoid network congestion-related issues, consider setting up dedicated Gigabit Ethernet network or implementing VLANs with non-blocking switches supporting Quality of Service. Optimize bandwidth utilization, by implementing Jumbo frames and increasing value of Maximum Transmission Unit.