The benefits of hardware versus software RAID
Frank W. Poole
As servers become more powerful and memory cheaper and more plentiful, software RAID with its multiple processors and gigabytes of RAM can help cut costs, or can it? For some simple data striping or mirroring applications involving only a couple of hard drives, it may very well be the best choice. But for very large volumes of disk space and extremely rapid file transfer, intelligent RAID controllers are the best answer.
In fact, the surest way to accomplish very large (over 500 gigabytes) volumes is with multiple channel intelligent RAID controllers. By spanning several large hard drives over multiple channels on a single card, you are providing the necessary redundancy and volume of disk space, as well as added redundancy and additional SCSI I/O channels. As a result, a single channel or cable can fail without taking the RAID array completely offline. Also, the operating system (like Windows, NetWare, or Unix) no longer deals with individual drives, but the entire volume, or virtual disk, created by the intelligent controller.
Better accessibility is a major objective of RAID since it prevents downtime in the event of a hard-disk failure; however, it cannot recover data that has been deleted by the user or been destroyed by a major event like theft or a fire. To secure your RAID system from such events, regular backup is critical.
Although there are many different ways to make use of various RAID levels in many different applications, this article concentrates on the more commonly used levels: RAID 0 or data striping, RAID 1, and RAID 5. RAID combines several independent hard drives to form one larger logical array. Data is stored on the array, and "redundancy information" is added. The redundancy information may be either the data itself (RAID 1 or mirroring) or parity information calculated out of several data blocks (RAID 3, 4, or 5).
RAID 0 (data striping)
Calling data striping a level of RAID is actually a misnomer since there is no redundancy. In fact, the lack of redundancy is stated in the number 0. Data striping combines two or more hard drives into one large volume or virtual disk drive, which can then be partitioned into smaller drives by the operating system.
Unlike drive chaining, where the data of one hard drive takes over where the last one left off, RAID 0 stripes data across the drives. By combining two or more hard drives together in this fashion, read/write performance, especially for sequential access can be dramatically improved. Data striping also results in uniform load balancing across all the disks on a system, eliminating disk hot spots. These arise when one disk is saturated with I/O requests while the rest lie idle.
Typical applications for data striping include video or desktop publishing workstations in which large files are stored on hard drives. Large files mean sequential data transfer, which can be improved using a striping configuration. This is extremely important when considering video workstations since rapid sequential file transfer is imperative for smooth video.
RAID 1 (mirroring)
In a RAID-1 system, identical data is stored on one or more hard drives (several hard disks can be chained together and mirrored to several other hard disks) providing 100% redundancy. If any drive fails within the mirror, the system continues working with the remaining drives of the "unbroken" side of the mirror.
This can be taken one step further by mirroring drives of a striped set (combining RAID 0 with RAID 1), which is known as RAID 10. Of course, 100% redundancy is expense, since twice the used disk capacity is needed.
For smaller server systems, or whenever the necessary storage capacity can be handled by one disk, RAID 1 is the best security solution. A single-channel mirroring controller is a relatively cost-effective solution with two mirrored hard drives attached to a single channel. A two-channel controller, however, provides even more performance and redundancy. The mirrored drives can be duplexed across separate SCSI channels (see figure). Two hard drives can then be accessed simultaneously by the controller, thereby increasing performance.
Other areas receiving increasing attention is placing the boot drive and swap files on one mirror and the user data, such as a database, on another, or mixing two RAID levels on one multi-channel controller. In the RAID-0 examples above, the striped set could be placed on separate channels from the boot disks, thereby keeping the operating system and user data completely separate. And, since operating systems like Windows NT or Linux work with swap files pretty intensively, these swap I/Os would not interfere with the user I/Os since they would be located on unconnected drives.
The greatest advantage of hardware over software mirroring is 100% data mirroring. Unlike most software mirroring, hardware mirroring copies every byte from the first hard drive to the second right down to the master boot record. Hardware mirroring also supports hot swapping, hot fixing (spare drives waiting for a disk failure), and even auto hot plugging to swap a failed drive during operation without server downtime. And, of course, hardware mirroring, as with all intelligent hardware RAID, places no additional load on the motherboard components (CPU, memory, etc.) or operating system.
RAID 5 is the standard RAID level for most high capacity servers and should remain so for the near future. In a RAID-5 configuration, the data is cut into blocks (16KB, 32KB, 64KB, or 128KB), which are written in turns across the data drives (similar to striping). A parity block (comparable to the "sum" of the data in that row) is generated from each row of data and stored on the next drive in succession.
In this way, the parity is distributed across all the hard drives in the array. At least three hard drives are necessary for RAID 5, since the parity generated will equal the capacity of one drive. With this parity information, lost data can be recalculated in the event of a drive failure.
When building a server with 18GB or more, it makes sense to consider RAID 5. Combining just three 9GB drives gives a user capacity of 18GB (capacity of one drive is lost to parity).
Again, for operating systems that work with swap files (e.g., NT, Unix, and Linux), it is advisable to separate the boot/swap file drive from the normal user I/Os. For example, a couple of mirrored hard drives for NT and a RAID 5 for user files. More security and performance can be added by using a multi-channel intelligent RAID controller, as well (see figure). For even more security, add a pooled hot spare drive to the system, which is immediately available to replace any failed drive in either array.
Probably the biggest use of disk arrays now and in the near future is for large database and web servers. The issue of separating different I/Os on mechanically independent arrays becomes even more important in these situations. Database managers frequently recommend "load balancing," or using independent hard drives for different database files. The same also holds true for web servers. A disk array automatically balances the load over several hard drives, but one can achieve even more performance by using independent arrays for independent I/Os.
Another important consideration is that intelligent RAID systems can be run on any major operating system or hardware platform. And when more capacity is required, the intelligent RAID controller allows for expansion by adding more disks.
Even though a fast, secure, and inexpensive machine is not possible yet, using intelligent RAID can be an efficient way to a fast and reliable system that can be optimized for almost any situation.
A two-channel configuration boosts both performance and redundancy.
Spanning several large hard drives over multiple channels on a single card provides the necessary redundancy and disk space for large volumes.
Frank Poole is senior tech support engineer at ICP vortex Corp. (www.icp-vortex.com), in Phoenix, AZ.