RAID was developed for HDDs but works with flash storage as well. The question is: how well does RAID work for flash? Are standard RAID configurations enough or are other approaches better for flash?
RAID (Redundant Array of Independent Disks) is ubiquitous in the storage world. The technology covers a variety of disk-based redundancy approaches that mirrors, stripes, and/or distributes parity so no one disk failure will render the data unrecoverable. Bad disks can be rebuilt from information located on other disks.
RAID is not a monolithic approach and different vendors will implement it in different ways. However, there are popular classifications that use striping, mirroring and/or parity to protect storage media in HDD and flash systems.
· RAID 0: Striping. RAID 0 evenly stripes data across two or more HDDs. RAID 0 s the exception to redundancy: it was created for performance gains and has no redundancy on its own. It is often combined with RAID 1 to add redundancy to speed, and is also a method of creating a logical disk from two physical disks.
· RAID 1: Mirroring. Array disks are split into mirrored sets with copies of stored data residing on each of the mirrors. Its popular usage is improving read performance or reliability at a cost of lower capacity. A classic RAID 1 mirrored pair contains two disks. Since the array is exactly mirrored, it is only as large as the mirrored data set. Nested RAID 0 and RAID 1 is expressed as 1+0 or RAID 10.
· RAID 5: Distributed parity. RAID 5 stripes block-level data across the RAID set and writes parity data equally across all drives. Parity builds fault tolerance into a data set by determining values for data located on a failed drive. RAID 5 usually uses XOR calculations, a mathematical function that recreates parity values from the remaining disks. In this configuration the disk set can lose 1 drive before failing. RAID 5 must have at least 3 disks (2+1). Read performance does not suffer from parity calculations but write is affected.
· RAID 6: Dual parity. RAID 6 is similar to RAID 5. It stripes block-level data across disk and writes parity data across drives. The difference is that it uses XOR plus a second block location algorithm (Reed-Solomon). This results in two parity blocks across all disks. As with RAID 5, parity calculations do not affect read performance but do retard writes. Calculation overhead can also affect rebuilds, which becomes a big issue with large HDDs in big RAID sets. RAID 6 does however provide higher availability than RAID 5. How the vendor implements RAID 6 can make a difference in performance, such as combining firmware with specialized high performance ASICs. RAID 6 disk sets can lose 2 drives before failing and is represented as 14+2 (16 drives).
General RAID Issues
RAID has several issues that can impact both HDD and flash drive RAID. The first is the stripe size set, which is the number of disks in the RAID group. The downside to big stripe sets are failed rebuilds from unrecoverable read errors, where the rebuild process cannot read a block of data. RAID rebuild time is also an issue since the bigger the disk set, the more time the rebuilds take. Background workloads also affect rebuild times as do the amount of the data to be rebuilt.
There are also trade-offs in RAID between performance, protection levels and capacity. RAID 1 works well for high performance data that does not need high capacity (mirroring takes a lot of disk space). Balanced performance/capacity workloads with high redundancy needs can use RAID 5, and RAID 6 is good for large capacity data sets with high redundancy requirements and lower performance needs.
Traditional RAID is just the start; vendors build in different implementations to improve performance and/or capacity in RAID groups. For example, hyperscale computing implements RAID at the server level using redundant server groups. Wide-striping is another RAID method that stripes data across many drives. For example, IBM’s grid storage XIV does not use traditional RAID 5 or 10. It partitions HDDs into 1MB sizes and mirrors them across all array disks. Each chunk mirrors itself to separate drives in separate modules cross the grid. And NEC’s D-series has RAID-TM for triple mirroring. NEC mirrors data to three separate HDDs and even with two failed drives the data can be rebuilt.
What Happens when Flash Storage Enters the Picture?
Flash does not present a radical change for RAID, which was invented for spinning disk but still works with flash. Here’s the thing to remember about RAID and flash media: flash does not have the moving parts that can down a hard disk, but it is physical and it will fail. A finite amount of program erase (P/E) cycles is the primary culprit. Wear leveling distributes write I/O to extend media life but will not last forever. Data still needs protection against flash failure.
RAID 1, 10, 5 and 6 – and additional configurations — still work in this environment. For example, RAID10 combines mirrored sets into a striped configuration across the flash system to remediate the wear from P/E cycles. However, although traditional RAID works for flash it is not optimized for flash. For example, RAID 5 and 6 parity operations even slow down HDDs with their high-overhead cycle: read data block and parity block; compare old block with write request; write updated data block and updated parity block. The parity operation’s impact on high performance flash can be profound.
This is not necessarily a major problem in hybrid flash systems that combine a few SSDs with HDDs. Most flash storage vendors come from a HDD array background and start with standard RAID configurations on their flash systems. They might layer RAID-optimized processes on top of standard RAID. This approach however can be a performance and capacity problem in all-flash or heavily flash-enabled hybrid systems.
Let’s take a look at flash vendors and see how they are handling RAID with their flash systems.
EMC uses standard RAID on their VNX and VMAX flash-enabled families. Their all-flash XtremIO is a different kettle of fish. XtremIO Data Protection (XDP) bills itself as flash-optimized data protection that does not use traditional RAID. XDP uses very wide striping for better capacity usage and an N+2 scheme similar to RAID 6 for fault-tolerance up to two failures. Its parity algorithms are specific to flash and take fewer IO cycles.
HP’s 3PAR StoreServ and StorVirtual platforms contain flash arrays. HP runs traditional RAID on the systems plus RAID MP (multiparity) in StoreServ. RAID MP accelerates RAID 6’s high parity overhead with high performance ASICs.
Violin Memory uses vRAID on its NAND flash systems. vRAID avoids RAID 5 and 6’s high overhead parity cycles, and its erase-write cycle does not affect other I/O performance.
IBM combines RAID 5 and Variable Stripe RAID in its FlashSystem arrays. RAID 5 stripes data and parity across flash media at the array level, while Variable Stripe RAID mirrors and provides parity across 10 flash chips in a set.
Pure Storage uses traditional RAID plus proprietary RAID 3D. 3D remediates transient SSD failures, which present as a read I/O delay on a single flash drive. When sensing the failure, 3D rebuilds the read request from other devices in the same parity group.
HDS’s VSP with Accelerated Flash Modules can use Raid 1, Raid 5 and Raid 6. Its HUS line (Hitachi Unified Storage) with SSDs can use Raid 0, Raid 1, Raid 10, Raid 5 and Raid 6.
Nimble Storage employs Cache-Accelerated Sequential Layout (CASL) on its SSD arrays. Economical data handling enables Nimble to support dual-parity RAID 6 without retarding performance.
Nimbus Data’s Gemini flash systems support RAID 5, 6 and 10. Nimbus claims that RAID does not affect their flash speeds and rebuilds take advantage of high flash performance.
At the end of the day, flash is a physical medium — and media fails. Even traditional RAID will work their physical protection magic in flash systems, particularly in hybrid arrays. As the industry turns to purpose-built flash systems, we expect to see more RAID-like products that optimize the flash environment. Until then, don’t be afraid to use traditional RAID on your flash systems – but look out for optimized flash protection that will preserve your flash system’s high performance and capacity.