Evaluating RAID for VLDB
In the first of a three-part case study, IT administrators examine the basics of RAID and the tradeoffs of various configurations.
By Edwin E. Lehr and Christopher Schultz
As a database grows, manageability and the probability of hardware failures become increasingly important issues. Disk drive manufacturers advertise mean time between failure (MTBF) values as high as one million hours. However, with a system the size of our bank`s marketing and analysis system--750GB+--at least four drive failures are expected each year, resulting in significant downtime. RAID reduces this downtime to one drive failure in 25 years.
The primary purpose of RAID is to provide a fault-tolerant array of drives. Should any one drive fail, there is enough redundant information stored across the other drives to reconstruct the lost data on the fly and since today`s systems accommodate hot-swappable components, most hardware failures can be repaired without interrupting other operations. Additionally, RAID increases the manageability of a system by making the drives in an array appear as a single, large drive to the host computer. Instead of manipulating hundreds of small drives to maintain file systems, administrators deal with only dozens of RAID arrays, even in today`s Very Large Database (VLDB) environments.
This series of articles discusses why and how to implement RAID in UNIX systems supporting VLDB environments. Though the discussion is limited to VLDBs that use Oracle RDBMS, much of the information can be extrapolated for other environments and applications, provided some technical caveats are observed. We have tried to consider relatively nontechnical readers while presenting the detail required by database and systems administrators.
RAID is a system of drives controlling CPUs, memory, and software. Two or more independent drives appear to the host operating system as one larger drive. This "virtual" drive attaches to the host as a single drive would, requiring no changes to the operating system or the application. In its simplest form, RAID can be implemented by most UNIX operating systems with native software using a logical volume manager. The result: improved data availability and performance and simplified maintenance and storage management.
In most implementations, data is evenly distributed across all the drives. Rather than filling up disks one at a time, files are split so that they span multiple drives--e.g., Record 1 is saved to Drive 1; Record 2, to Drive 2; Record 3, to Drive 3, etc. This arrangement of data, known as RAID level-0, or disk striping, improves performance because multiple drives are able to access different parts of files. On the downside, RAID level-0 does not increase data availability. In fact, when a single drive fails, multiple files become unavailable since data is intermingled on multiple drives.
RAID level-1, or mirroring, takes redundancy to the extreme. This data arrangement calls for a duplicate set of disks: a primary set and a secondary set. Since the two sets of disks respond to different read requests, performance is better. Also, RAID-1 is highly reliable. As long as both drives of the same pair are not affected, data is still available in multiple drive failures. The one drawback: RAID-1 doubles the cost of storage. Typically, it is too expensive to be practical for most applications.
The way RAID levels 2, 3, 4, 5, 6, 10, 53, etc., store data on these virtual drives makes all the difference. These RAID levels employ some form of parity checking. Simply put, parity checking is a mathematical method of calculating a missing byte of data by comparing its original parity value to the present value to determine the value of a missing part. When data is lost, the remaining pieces of data are summed and then compared to the original sum--the difference is the value of the missing data.
Remember, we are talking about binary values, i.e., ones and zeros in a particular order. To put it simply, let`s say you have eight buckets, each of which contains a "1" or a "0". Suppose the sum of the buckets is six when written to the array. This parity, the value six, is stored independently of the buckets. Now suppose bucket number seven is lost. What was the value ("1" or "0") of bucket seven? If the sum of the remaining buckets is now five, then bucket seven must have had a value of one. If the current sum is six, then the bucket must have had a value of zero.
This illustration is a simple way of describing how parity works; the actual implementation, however, is not so straightforward.
A RAID array`s approach to parity can be used to compare and contrast various implementations. Precisely how the parity function is implemented creates RAID`s frustrating matrix of performance, reliability, capacity, and cost tradeoffs.
Now, just in case you think you understand RAID and parity, throw this in the mix: RAID Levels 0, 1, and 0+1 do not use parity at all. RAID-0 has no redundancy and even increases your exposure! RAID-1 stores everything twice, so parity isn`t necessary. It is also important to understand that RAID levels are not hierarchical, as the numbers would have you believe. Higher RAID levels are just different implementations of data protection; they are not necessarily better than lower levels.
The table describes the RAID levels applicable to most environments and the tradeoffs of the various levels. (Levels 2 and 4 have been omitted. Aside from being more expensive, RAID-2 is not significantly different from RAID-3, and almost all potential RAID-4 applications are better served by RAID-5.)
Deciding which implementation to use can be daunting. However, enough money makes almost any technical decision easy. The RAID decision is particularly responsive to input parameters beginning with $. If money is no issue, use RAID 0+1. However, if you`re implementing a VLDB requiring hundreds of gigabytes to terabytes of storage and you`re working under the typical IT budget constraints, read on.
Why RAID at all?
The four main benefits of RAID are improved data availability, better performance, simplified maintenance, and simplified storage management.
- Data availability. Nearly all RAID vendors tout improved data availability, but database and systems administrators should remember that RAID in and of itself buys nothing but increased availability unless the vendor`s implementation includes component redundancy. Without component redundancy, a RAID implementation suffers from single points of failure (SPOF), which exposes you to more catastrophic failures than you were risking without RAID!
Let`s say you currently have a JBOD (just a bunch of disks) configuration. The more disks you have, the higher the likelihood of a failure. Compare the impact of a single drive failure on your operation with the impact of losing all your drives at once (due to, say, a power supply failure).
Or, how about having your database files corrupted by a failing cache? Redundant power supplies and mirrored cache are as important as data redundancy, as are a host of other redundant components.
- Simplified storage management. RAID simplifies a system administrator`s job tremendously. With RAID, system administrators have at most a handful of file systems to manage--instead of dozens or more--and these systems can be easily managed with graphical user interfaces.
If you think this aspect of RAID is only a windfall for the administrator--not a real system benefit--think again. Complexity spawns errors on an exponential curve. Storing terabytes of data on JBOD is a daunting management task, involving hundreds of drives, dozens of controller cards, yards of cabling, and tens of file systems. Component redundant RAID reduces the number of potential failure points significantly.
- Simplified maintenance. The best RAID vendors supply user-serviceable products, which means failed drives can be removed and replaced on the fly and other components (such as fans, power supplies, and even controllers) can be replaced by the user. Some products even have "hot-swappable" components, which allow users to change them while the array is operating. These features lower costs and reduce downtime; however, their implementation depends highly on the vendor. Some vendors only provide a minimal level of user serviceability, such as drive replacement.
- Performance improvements. RAID improves performance through efficient load balancing and parallel access. With JBOD, database and system administrators must constantly monitor I/Os to make sure no one drive gets too much of the load. Database files must be created, data loaded, and files allocated in such a way to increase parallel access. All this is doable but requires significant effort in VLDB environments, and must be redone whenever data-access patterns change. With RAID, the load is balanced automatically and there are as many paths to the data as there are drives. Therefore, partitioning of data is less of a concern to database administrators (though the technique can still be used with RAID to good effect).
Now, do you need improved reliability, simplified storage management and maintenance, or improved performance? As your database grows in size, the number of drives increases and so does the frequency of drive failure. VLDBs are not created for insignificant work and they are large by definition. Therefore, they are generally good candidates for solutions designed to improve reliability.
The size of VLDBs also drives the need for simplified storage management and maintenance. The need for simplified maintenance depends on your up-time requirements and the skill level of your support staff. Performance may or may not be an issue, and depending on the application, RAID may or may not improve it significantly.
Part 2 of this series, coming in next month`s InfoStor, provides a technical look at how to optimize RAID for Oracle VLDB performance.
At the time this article was written, Edwin E. Lehr was the Oracle database administrator and Christopher Schultz was the UNIX system administrator at a major bank in eastern U.S. The bank has since been acquired. Lehr is now a principal consultant at Oracle Corporation and Schultz is systems engineer at Silicon Graphics, Inc.