Understanding RAID

A company’s greatest asset, besides its employees, is its data. Millions and millions of dollars are spent to backup data, replicate data, etc. all in an attempt to protect against data loss. Backups and replication don’t actually protect against losing data, they are ways to recover from a data loss. The only true defense to protect from data loss is to implement a disk solution based on RAID technology.

RAID is a necessary building block when undertaking a data protection initiative.

RAID (redundant array of independent disks) was first defined in a paper published by U.C. Berkley in 1988. The paper defined RAID levels 1, 2, 3, 4 and 5. Today, even more levels have been defined.

RAID Level Zero

–

RAID Level Zero is not one of the originally defined RAID levels and there is some debate if it should even be considered a RAID level since the disks are not redundant.

RAID Level Zero, a.k.a. disk striping, is where a stripe of data is written equally across a group of disks. If one of these disks should fail, all of the data on the group of disks is lost.

See more articles about storage

While not a safe way to protect data, it does deliver higher performance compared to an equal number of independent disks. RAID Zero is rarely used alone but is frequently used with other RAID levels to provide faster performance.

RAID Level 1 – This RAID level is where the same data is written (or mirrored) to two disks. If a disk fails, data is read off the mirrored disk. When the failed disk is replaced, the data on the surviving disk is used to recreate the mirrored pair.

All of this happens with no loss of data for the host applications. RAID Level 1 is one of the most commonly used RAID levels and performs very well for reads and writes.

RAID Level 2 – RAID level 2 is not used by any commercial RAID systems on the market and will not be discussed.

RAID Level 3 – RAID Level 3 uses an error correcting code called parity to protect against the loss of a single disk. Data is written in parallel in bytes to the data disks (at least two) while parity is written to a dedicated disk.

The disk spindles are synchronized (each byte of a stripe of data, and that data’s parity, occupies the same area on each disk) which increases throughput by minimizing disk head movement.

When a data disk fails, the data from the dedicated parity disk is used to recreate the data to serve host requests and to rebuild the failed drive when replaced. If the parity disk should fail, the data disks are used to recreate parity and written to the replaced parity disk.

RAID Level 3 is best for large sequential data access (i.e., video streaming). Performance for small, random access of the data is slow since every I/O requires activity on every disk. RAID 3 is rarely used today since better performance and identical protection can be achieved with RAID level 5.

RAID Level 4 – RAID Level 4 is similar to RAID level 3 (striped parity with a dedicated parity disk) except the data is written in blocks, not bytes.

Writing blocks of data increases random access performance, since an I/O may only require access to one disk instead of every disk in the group like with RAID 3. But the dedicated parity disk can be a bottleneck for writes. Recovery for a lost drive works the same as RAID level 3. RAID level 4 is not widely adopted.

RAID Level 5 – RAID Level 5, like RAID levels 3 and 4, uses parity to protect the data from a single disk failure. Unlike levels 3 and 4, the parity is rotated or distributed across all of the drives in the volume.

Read performance is substantially better than for a single disk because there is independent access to each disk. As with levels 3 and 4, write performance can be impacted due to the complexity of parity processing but with parity being striped across all the drives, there is no single disk bottleneck with RAID 5.

RAID Level 5 performance is scalable, as more disks provide more independent access. In the case of a disk failure, data from the lost drive is computed from parity (using an arithmetic function (XOR)) stored on the other drives in the disk group.

RAID Level 6 – RAID Level 6 is much like RAID 5 (striped parity) except instead of one parity block per stripe there are two. With two independent parity blocks, RAID 6 can survive the loss of two disks in the group.

Read performance will be similar to RAID 5 but write performance will be slower since the second block of parity needs to be calculated and written. RAID 6 is not widely adopted today but more and more companies are bringing solutions to market.

The need for RAID 6 came to light as serial ATA (SATA) drives grew in size, but not in performance. SATA drives are usually configured in RAID 5 groups (it does not make sense to mirror 500GB SATA drives since the data is usually a second or third copy and the overhead is excessive).

When a disk fails, they are so large and slow that rebuild times can be very long and a second drive can potentially fail during this period. RAID 6 provides the highest level of protection against drive failures of all the RAID levels, however, its wide-spread acceptance will be based on whether companies will be willing to pay for the extra capacity cost or suffer the performance impact to insure against a relatively rare event of two drives failing simultaneously.

Plaid RAID – With the maturity of hardware-based and software-based RAID and the requirement for increased performance, a new, unofficial RAID level has cropped up: plaid.

There are three places to implement RAID: software, RAID controllers and storage arrays.

Software RAID – RAID implemented on a server by software uses internal drives or external JBOD (just a bunch of disks). The software, usually a logical volume manager, manages all of the mirroring of data or parity calculations.

The overhead associated with the parity calculations can be excessive and may cause applications to run slowly. Software RAID is good for a single server and is not recommended for I/O intensive applications. Software RAID is usually used in conjunction with a storage array to create plaid RAID levels.

RAID Controller – Another way to implement RAID on a server is to use RAID controllers. These are cards that can be added to a server and offload the overhead of RAID from the CPUs.

RAID controllers are a far better solution for a single server than software RAID solutions since server CPUs spend no processing power calculation parity or managing the mirrored data. Like software RAID, RAID controllers use either internal drives or JBOD. A server-based RAID controller can fail and be a single point of failure.

Storage Array – A storage array usually consists of two high-performance, redundant RAID controllers and trays of disks. All pieces of the array are redundant and built to withstand the rigors of a production environment with many servers accessing the storage at the same time. They support multiple RAID levels and different drive types and speeds.

Storage arrays also usually have snapshots, volume copy and the ability to replicate from one array to another. If the servers need high performance, large capacities or superior performance, storage arrays are the right choice.

RAID is a necessary building block for any company’s data protection needs. Without RAID, even a small glitch in a disk drive could cause data loss. Thankfully, with software RAID, server-based RAID controllers and external storage arrays, all companies—from the smallest to the largest—can find a RAID solution to protect their data.

This article was originally published on CIO Update.

Jim McKinstry

Company

Categories