Particularly if you use Serial ATA (SATA)-based disk arrays, you may be a candidate for RAID 6, which provides an added level of data protection versus more-conventional RAID configurations such as RAID 5. RAID 6, which is sometimes referred to as “double-parity RAID,” has been receiving a lot of attention recently, in part because of the (perceived) reliability issues with SATA drives and because of the long rebuild times associated with the high capacity of SATA drives (up to 500GB per drive).
RAID 6, which protects against a second drive failure occurring during drive rebuilds, is not new. Hewlett-Packard has been selling for years what is essentially a RAID-6 implementation that HP refers to as Advanced Data Guarding (ADG). Vendors such as Hitachi Data Systems, IBM, Network Appliance, and InoStor (a division of Tandberg Data), have also offered RAID 6 in some of their disk arrays for some time. Earlier this year, Ciprico introduced a 4Gbps Fibre Channel array with RAID-6 functionality, and JMR Electronics is shipping a “triple-active” RAID-6 implementation in its Fortra Turbo RAID array, which combines Fibre Channel host connections and SATA disk drives.
A number of RAID controller vendors, and their OEM customers, are readying RAID-6 implementations that will be available later this year or early next year. Examples include Adaptec, AMCC, LSI Logic, Promise Technology, and Xyratex (which is acquiring nStor). Taiwan-based Areca is already shipping a controller with RAID-6 functionality, and Infortrend was expected to ship SATA and Serial Attached SCSI (SAS) systems this month with RAID 6.
RAID 6 is sometimes referred to as the ability to recover from simultaneous failures of two (or, in some cases, more) disk drives in an array without data loss. However, since the chances of two drives failing simultaneously are virtually zero, it’s more helpful to think of RAID 6 as the ability of a disk array to survive from a failure of a second drive while another failed drive is being rebuilt. In addition, RAID 6 protects against errors during drive rebuilds.
This can be particularly important with SATA drives because rebuild times can be very long due to the high capacity of the drives. Although controller manufacturers are reducing rebuild times, rebuilding a failed SATA drive can take four to eight hours, or more. This is one of the primary reasons why vendors are moving to RAID 6 and why you should consider the technology, particularly if you’re using SATA drives (although RAID 6 can be used with any type of disk drive). Another reason is the perceived reliability problems with SATA drives, although that is a highly debatable issue.
“RAID 6 is not about protecting against simultaneous drive failures, which is highly unlikely; it’s about protecting against errors during a rebuild of a failed drive,” explains Luca Bert, director of product management and system tests in LSI Logic’s RAID storage adapters division. LSI Logic demonstrated RAID-6 functionality at last month’s Intel Developer Forum conference and will include RAID 6 in its upcoming SAS controllers that will be rolled out over the next few months. (In addition to SAS disk drives, SAS controllers also support SATA drives.)
Although some vendors say that RAID 6 is coming to the fore because of the relatively low reliability of SATA drives (compared to SCSI, SAS, and Fibre Channel), other vendors contend that drive-level reliability isn’t the real issue. “RAID 6 is not so much an issue of the perceived reliability issues with SATA drives; it’s more an issue of the large-capacity of drives such as SATA and the longer rebuild times,” notes Scott Cleland, director of marketing at AMCC. “And RAID 6 protects against an unrecoverable read/write error during the rebuild.” (SATA drives typically have higher bit-error rates than SCSI or Fibre Channel drives.) AMCC plans to ship RAID controllers for PCI Express systems next year.
It’s unclear whether RAID 6 will remain a niche technology or if it will find widespread acceptance among end users, but some vendors think the technology has promise if vendors can clear a few hurdles. “If the vendors can solve the performance issue, RAID 6 could replace RAID 5,” says Tom Treadway, CTO of components and RAID at Adaptec.
Adaptec has been shipping SAS controllers with RAID-6 functionality to IBM for about three months. IBM will offer Adaptec’s Zero Channel SAS controllers and RAID-6 software as an option on its eServer xSeries 366 line of Intel Xeon-based servers.
Promise Technology, another RAID controller/subsystem vendor, plans to deliver RAID 6 on its SuperTrak RAID controllers in November (for PCI Express and PCI-X systems). RAID 6 for Promise’s VTrak subsystems (including the new M-Class systems) is due next year. Promise’s RAID-6 technology will be available via free download.
Most RAID-6 implementations use a P+Q design, where P is a RAID-5 parity drive and Q is an extra parity drive for RAID 6. As such, one potential penalty with RAID 6 is the requirement of an extra drive (which increases overall array costs). The user capacity of a RAID-6 array is n-2, where n is the total number of drives in the array. (The total user capacity of a RAID-5 array is n-1.) A RAID-6 array must have a minimum of four drives.
Because disk drives are inexpensive, the additional costs associated with additional drives are not seen as a major drawback to RAID 6. What may be a drawback is the performance penalty inherent in RAID 6.
RAID 6 does not have a performance penalty for read operations, but it does have a performance penalty on write operations due to the overhead associated with the additional parity calculations. How big the penalty is depends on a number of factors, such as the size of the I/Os, the read-write ratio, and the random-versus-sequential nature of the I/Os.
Performance degradation will vary widely, but Adaptec’s Treadway says that while sequential write operations may impose only a 10% performance penalty vs. RAID 5, random write operations may incur as much as a 33% penalty. LSI’s Bert basically agrees, saying that sequential write operations (e.g., video streaming) may exact a 5% to 10% penalty, while random writes may lead to a 25% to 30% penalty. And AMCC’s Cleland says the write penalty could be as high as 50% compared to RAID 5. “If you need high-performance writes, you may not want to go with RAID 6,” says Cleland.
Another drawback to RAID 6 is that, at least for now, there aren’t any standards governing how vendors implement the technology. However, the Storage Networking Industry Association’s Disk Data Format (DDF) group is working on RAID-6 recommendations in its Common RAID DDF specification. (For more information on SNIA’s DDF, see “Solving the RAID compatibility puzzle,” InfoStor, January 2005, p. 38, or visit www.snia.org.)
RAID 6 typically requires specialized hardware so it’s more expensive to implement than, say, RAID 5 but it’s unclear if there will be a price premium for RAID 6 at the end-user level, other than the additional parity drive. Most likely, the cost of RAID 6 will be buried in the controller price, as opposed to being tacked on as a premium.
In general, RAID 6 is appropriate for the same types of applications as RAID 5, but may be a better solution where the highest fault tolerance is required and the highest write performance is not critical.
Alternatives to RAID 6
Note: This sidebar is adapted from a white paper, “SATA Recasts the Spotlight on RAID 6, But is it Right for Your Network?”, by Scott Cleland, director of marketing at AMCC (www.amcc.com).
It is possible to ensure against the vulnerability of a disk array in degraded mode without incurring the penalties associated with RAID 6. (The vulnerability window is the period of time between a drive failure and rebuilding the spare drive.) In general, the faster the rebuild, the lower the risk of a second drive failure during the rebuild. RAID-5 systems with reduced rebuild times minimize the chances of a second drive failure causing loss of data. The RAID controller plays a critical role in this process.
There are several alternatives to implementing RAID 6:
- Hot sparing with automatic rebuild. This does not speed up the rebuild, but does remove the time delay between drive failures and drive replacement. Multiple arrays on a single controller can share a single hot spare for automatic rebuild;
- Set the rebuild priority to the highest level. This will slow the application down during rebuilds but will minimize the exposure time;
- Minimize the number of drives per array in line with capacity requirements. The greater the number of drives in a single array, the higher the probability of a second drive failure;
- The higher the MTBF (mean time between failure) of the drive, the lower the probability of a drive failure.
- Always look for drives with high MTBF ratings; and
- Use a higher number of lower-capacity drives. The bigger the drive, the longer the rebuild time. Lower-capacity drives shorten drive rebuild time. In addition, lower-capacity drives tend to be less expensive, so the cost savings may cover the cost of a hot spare.