Ultra-ATA RAID provides compelling price/performance
Data-storage requirements of specific enterprise applications, which are often placed on dedicated servers, are exploding. The next generation of servers also sports extended PCI bus capabilities for greater I/O. Ultra-ATA disk technology brings down the cost of terabyte-class RAID storage arrays.
By Keith Walls
Conventional wisdom about disks puts Fibre Channel and Ultra2 SCSI at the top of the performance heap. Ultra ATA, the successor of IDE, is considered a relic of the desktop. Certainly Ultra ATA, with its master/slave configuration restrictions, is totally ill-suited for RAID on a high-end server. Why else would Fibre Channel and Ultra2 SCSI command a three-to-one premium in the cost per megabyte of drives?
Why, indeed? Traditionally, the answer has been that PC CPUs were just too anemic and underpowered to process all of the overhead needed to support a RAID configuration. This is especially true for RAID5, which requires the calculation of parity bits for redundancy on each write. As a result, SCSI-bus based hardware RAID controllers with considerable firepower were introduced to off-load the processing of RAID disks. A classic example of such a controller is the Dell Power Edge RAID Controller (PERC) with an Intel i960 CPU and a 32MB cache.
A lot has changed, however, in the past few years, that makes that analysis suspect. First off, there has been the explosion in CPU power. First the Pentium Pro, then the Pentium II, and now the Xeon chip have doubled and redoubled the processing power found in the original Pentium-based PCs. This leaves plenty of headroom for handling RAID. Even at the very high-end, the X/ORAID Module used by Box Hill, which produced a throughput of just over 100MBps, is a software-based RAID solution.
The second factor changing the face of RAID is the precipitous drop in disk-storage costs. The idea behind RAID is first to gain performance by spreading data across as many drive spindles as possible to increase throughput, and second to add extra bits to gain a measure of redundancy at minimal extra cost.
The most cost-effective way to do this is with RAID5, which adds a parity bit on each write. This scheme provides the lowest storage overhead--just 1/nth of the data for n drives is overhead.
This scheme allows any one of the n drives to fail and the system will continue. The overhead of calculating and writing those extra bits, however, is horrific. With the price of disks now down to pennies per megabyte, an alternative called RAID10 is becoming popular. In this scheme, each disk is first mirrored (RAID1) to a second disk and then all of the mirrored pairs are grouped and data is striped (RAID0) across the new logical volume. Mirroring provides redundancy, so now any one member of any pair can fail--50% of the disks--and striping provides the multispindle performance boost. Better yet, there are no parity bits to calculate.
The third change factor is the evolution of the PCI bus in PCs. Now pushing data at a burst rate of 132MBps, new servers are coming with multiple PCI buses as the standard configuration--including the latest iteration, a 64-bit wide PCI bus. So with RAID10, the new bottleneck is clearly the I/O connection to the PCI bus. In the case of Ultra SCSI, that limitation is 40MBps. Even the new Ultra2 SCSI at 80MBps throttles the PCI bus by a factor of 60%. And if you are not using the incredible connectivity of Fibre Channel to place multiple CPU nodes on an arbitrated loop, then you are throttling the PCI bus for no advantage.
The question, then, is: Why not design an ASIC interface chip to connect a drive directly to the PCI bus rather than to a secondary I/O bus such as SCSI? The answer is simple: The standard disks for desktop PCs have for years done just that. These drives, however, have languished as low-end commodity storage devices. With the introduction of the Ultra-ATA interface, these drives now sport rotation speeds of 7,200rpm and storage densities of 18.2MB. Also, the disk can master the PCI bus and transfer data to main memory at a rate of 14MBps (Bus Master DMA). That`s hardly the description of a low-end device.
More importantly, since each Ultra-ATA interface is independent, the throughput is additive up to the bandwidth not of a secondary controller, but up to the bandwidth of the full PCI bus, which could shortly be going to 1GBps when PCIx is introduced by Intel.
A great concept, but there is still one significant problem to be solved: How do you plug all of these drives directly into the PCI bus of your server, which was originally designed for just four add-in cards? After all, with single-ended SCSI you can extend out to six meters, and with differential SCSI the cable can stretch a full 25 meters.
If you are using a notebook PC with a docking station that has all those necessary peripherals like a CD-ROM and extra ports, then you know the answer: the PCI-to-PCI Bridge chips from the likes of Texas Instruments and Intel. All you need is a PCI card for your PC that extends the PCI bus out to a RAID storage tower, from which the PCI bus can be daisy chained to another RAID tower.
In the zone
That`s just where Consensys comes into the picture. The Consensys product suite includes all the necessary hardware to extend a computer`s PCI bus out to a large array of potential RAIDzone canisters, which contain Ultra-ATA drives in multiples of five drives. Each group of five drives is dubbed a SmartCan, after the advanced management protocol for ATA drives. In addition there are, of course, the RAIDzone drivers and an administrator`s console to configure the drives in a RAID0, 1, 5, or 10 configuration. This software also handles all of the RAID overhead and presents the Windows NT OS with a single logical drive.
With single Ultra-ATA drives now street-priced at just three cents per MB, the price is right for RAIDzone. The questions are: What kind of performance will you get and is there any penalty for abandoning hardware-based RAID? To answer these questions, CTO Labs configured a Dell 4200 PowerEdge Server with two 333MHz Pentium II CPUs with Fibre, Ultra SCSI, and the RAIDzone Ultra-ATA storage in both RAID5 and RAID10 scenarios.
Our principal goal was to compare SCSI to ATA in a single server. The results of the Fibre Channel are for comparison only, as the implications of that mode of connectivity go far beyond single servers.
For our Ultra-SCSI configuration, we used a Dell PowerEdge RAID Controller (PERC), which does hardware RAID via an i960 CPU chip and enhances throughput with 32MB of on-board cache. Connected to this controller were eight Seagate Cheetah drives, which spin at 10,000 RPM. To effectively do a RAID10 configuration with firmware on this controller, we actually had to configure the logical drive backwards. The firmware can only "span" two logical drives, which translates to striping over just one pair of mirrored drives. To overcome this problem, we first striped four drives and then mirrored the stripe sets via the spanning function.
In the Ultra-ATA configuration, we used a single PCI bridge adapter and connected it to a 10-drive tower. These drives were Seagate Ultra-ATA Medallist Pro drives, which spin at 7,200rpm. In the case of the RAIDzone software, the RAID10 configuration was done correctly. The five drives in each of the SMART cans in our tower were first paired in mirror sets and then striping was applied to the mirrored pairs.
We began by testing the raw (un-buffered) throughput of data reads for each system in a RAID10 configuration. As expected, the Box Hill Fibre Box Array was the best performer, topping out at 100MBps. The surprise second place finisher, however, was the Consensys RAIDzone subsystem. At one tenth the cost of the Fibre Box system, the RAIDzone was pushing out 70% of the performance.
The problem with RAID, however, is not in reading data but in writing it. When we compared writing data in both a RAID5 and a RAID10 scenario, the results were interesting. In a RAID5 scenario, the i960 on the Dell PERC did worse than the RAIDzone. However, neither system, demonstrated stellar performance.
Surprisingly, when we switched to the lower overhead of a RAID10 scenario, the PERC outperformed the RAIDzone at large block writes. In both cases, performance with RAID10 was two to three times better than with RAID5.
The big shock came when we turned to our load benchmark. This benchmark fires off an increasing number of I/O daemons, which pound the disk in a "controlled random" pattern that includes hot spots to simulate index file areas. With RAIDzone entirely dependent on the host computer`s CPU, conventional wisdom would expect the RAIDzone subsystem to have greater overhead and support fewer daemon processes. In our tests, quite the opposite proved true. The RAIDzone blew away the competition, running out to 500 processes before the average access time fell below 100 ms.
In our last set of tests, we examined sequential access throughput with one and four simultaneous processes. With one user, the interaction of the PERC cache and the Win NT cache actually improved performance. Typically, with Win NT caching on, reading a file for the first time significantly degrades performance, as the OS must first fill its cache with the new data. For this reason, applications that know the data will be read just once, such as the CA-ARCserve backup package, turn off the Win NT cache to enhance performance.
Thus, the 34.8 MB per minute rate off of the RAIDzone array will pay major dividends during backup and SQL Server operations.
Surprisingly, the single-user performance boost of the PERC cache disappeared in a four-user scenario. In this case, the RAIDzone subsystem provided a 50% performance boost.
Clearly, ATA-based RAID is a technology whose time has come. This is especially true when the architecture of the new Xeon servers is taken into account and these servers are placed into specific roles, such as supporting I/O intense applications like Exchange, SQL Server, and SAP/R3.
Note: This article is reprinted with permissionn from BAckOfficeCTO magazine, a sister publication of INFOSTAR. For more information or to subscribe, visit www.backofficemag.com
LEFT: CTO Labs began by testing the raw I/O throughput of RAID on three storage interconnect technologies: Fibre Channel, Ultra SCSI, and Ultra ATA. Surprisingly, throughput on reads for RAID10 on Ultra-ATA was 70% of RAID10 on Fibre. RIGHT: As expected, RAID10 writes were much more efficient than RAID 5 writes.
The most surprising results came running the CTO Labs disk load benchmark. This benchmark launches an ever increasing load of daemon I/O processors, and yet the RAIDzone subsystem, which is the most dependent on the computer`s CPU, dramatically demonstrated the best performance.
Reading a single file with the Win NT cache on dramatically slowed the RAIDzone. This is a normal characteristic. Many programs, such as backup software, read with the OS cache off when the programs knows it will not make use of the cache. With multiple I/O processes, the abnormality of the improved performance of Win NT caching with simultaneous PERC caching evaporated.