RAID Boosts Data Availability

RAID Boosts Data Availability

Fault tolerance and continuous data availability are the battle cries for IT managers increasingly faced with 7x24 operations.

By Rick Cook

RAID is moving uptown. One of the striking trends in the RAID market over the last year has been the increasing sophistication of high-end arrays. Today`s RAID systems are more fault-tolerant and better able to compete in enterprise-critical applications, which were once the province of mainframes and expensive DASD. While high-end RAID arrays aren`t cheap by traditional RAID standards, they represent a major price-performance improvement over existing RAID and non-RAID systems.

RAID`s development is taking place against a background of paradoxes. Even though the cost per megabyte of RAID storage is dropping sharply, the price of storage is rising both relatively and absolutely.

"Storage is becoming the most costly hardware component in the server environment," says Thomas Lahive, senior storage analyst at Dataquest, a market research firm in San Jose, CA. "A couple of years ago, storage amounted to 30% of the value of a server. Now, in NT and UNIX environments, storage is more than 45% of the server value, and I expect soon it will be more than 55%."

At the same time, "the cost per megabyte of storage has dropped 30 percent per year," says Lahive, "but total capacity is going through the roof. The cost to buy that capacity on a unit basis has decreased significantly, yet users are still spending more on IT storage."

But as storage arrays get bigger and the information stored on them becomes more mission critical, the problems of reliability, availability, and manageability become increasingly important. Most trends in the high-end RAID market reflect this fact.

Two or three years ago redundant arrays with dual power supplies and hot-swappable drives were the height of RAID sophistication. For many applications today, that`s only the starting point. Customers with mission-critical applications are demanding dual controllers, mirrored arrays, sophisticated monitoring and management, and even remote mirroring to ensure data availability.

Dual redundancy is just a starting point for fault tolerance. Even in the midrange RAID market, some vendors are going beyond dual components. For example, MicroNet Technology ships its seven-drive DataDock 7000 RAID arrays with triple-redundant, hot-swappable power supplies and fans.

At the high end of the RAID market, full redundancy and hot-swap capabilities are de rigeur, and typically include redundant controllers (either active/ active or active/passive), caches, disk drives, fans, and load-sharing power supplies. Other fairly common fault-tolerant features at the high end include multiple simultaneous RAID levels, fault/status notification, automatic rebuild capability (without going into degraded mode), and global spare drives (e.g., one "global" spare drive is available to any redundancy group within the array). The main goal of redundancy is to eliminate single points of failure.

"It`s one thing to make the drive fault tolerant, but there are other things that have a bad habit of failing too," says Bob Katzive, vice president of Disk/Trend Inc., a storage market research firm in Mountain View, CA. "The industry has started to make available fault-tolerant equipment ranging from drives to controllers. Vendors are also putting in redundant power supplies and fans, and sometimes even the servers are redundant (e.g., with failover software or clustering)."

Vendors such as MTI Technology even include redundant power cords and batteries. MTI also differentiates its RAID arrays with customer serviceability (e.g., users can replace parts without calling on vendor service personnel); a "phone home" capability that pages storage administrators when there is a fault or status change; and a data consistency check capability that proactively looks for bad blocks and corrects them. In addition, MTI`s arrays do not require special kernel-resident software, which can potentially compromise operating system availability.

Maintaining data integrity also means preventing undetected data corruption--all the way down to the parity level. For example, Clariion uses a patented technique called "parity shedding" to ensure data integrity. (For details on how parity shedding works, go to www.clariion. com//products/mktinfo/parshed.html.)

The trend toward fault tolerance and data availability is being driven by business requirements. Says Anne Murphy, vice president of marketing at Storage Computer Corp: "Our customers want us to give them the flexibility to ensure that everything in their IT center can run 7x24."

Storage Computer`s arrays include a full operating system that runs sophisticated data-management applications. For example, the company`s Omniforce array management application lets storage managers decide on a file-by-file basis what items will be mirrored to ensure availability. "With Omniforce, a user can say `this data is critical and I want to mirror it, but this data is not so critical and I don`t want to pay the price of mirrored storage,`" says Murphy.

Mirror, Mirror

RAID also solves other availability problems, especially with disk mirroring. For example, some companies that can`t shut down their arrays for backup can do a hot backup more cost effectively using RAID mirroring. The contents of the working array are mirrored to the second array, and the backup is taken from that array. Even though the scheme requires a second disk array, it can have substantial economic and performance advantages. Mirroring is much faster than running a backup, and backing up off a non-active array can substantially lower the need for expensive high-bandwidth networking.

"There`s a lot of interest in mirroring RAID for doing backups," says Craig Harries, product manager for storage systems at Macro Computer Products, a Rochester Hills, MI, systems integrator specializing in high-end systems. "That`s something that has existed in the mainframe environment, but now people with increased storage in the enterprise are looking at multiprocessing--doing things two at a time instead of sequentially. A number of our customers require a backup before nightly updates, and mirroring can save them two or three hours," says Harries.

Of course, reliability has always been a major selling point for RAID, and that`s even more true as RAID becomes more sophisticated. Storage Computer`s Murphy differentiates between business recovery, business continuance, and continuous availability. "With business recovery, you`re saying you can restore data, but it may take three days, three weeks, or three months," she says. "For business continuity, you need tools that enable you to replicate sites. You can replicate locally in the building in case of a storage-array failure. Or, you may need another site remotely where you can access data and keep data operational. Finally, you get to where you can`t afford five minutes of downtime and cost is not an issue. That`s where continuous availability comes into play."

Continuous data availability and protection is often a software issue. For example, for disaster recovery and business continuance, EMC`s server-independent Symmetrix Remote Data Facility (SRDF) provides realtime mirroring, which allows users to make synchronized copies of data to remote Symmetrix arrays. Last month EMC enhanced SRDF`s performance (early tests show up to a 50% improvement, although actual results will vary widely). In addition, SRDF now requires as little as half the communications lines as previous versions.

The cost of these strategies varies considerably. IT managers have to choose the approach that offers the best balance of cost and data availability. Fortunately, vendors are giving IT managers and systems integrators more choices with new hardware and software.

Fault-Tolerant Fibre

For companies requiring the utmost reliability, Fibre Channel is an important emerging technology.

Fibre Channel runs at 100MBps over cable distances as long as 10 kilometers. Such distances mean mirrored arrays can be installed in different buildings, providing an added level of data protection. An extension of this trend is to locate the redundant server or array in another city.

Joel Reich, manager of Fibre Channel marketing at Clariion, sums up the fault-tolerant advantages of Fibre Channel:

- Dual-ported disk drives

- Dual, fault-tolerant loops

- Remote mirroring via inexpensive hubs at distances of up to 10 kilometers

- A very effective (compared to SCSI) error correction scheme

Extra reliability is one of the main reasons why large IT organizations are considering moving to Fibre Channel. The early adopters of Fibre Channel tend to be companies that, for security and reliability reasons, need arrays and servers to be located in separate physical locations or require performance boosts.

A Question of Balance

To produce the most cost-effective array for each application, the RAID array configuration has to be balanced against such factors as cost, throughput, and the performance of the rest of the system.

RAID array performance depends heavily on tuning. For maximum performance, for example, the block size read and written to the array needs to be adjusted for the application mix. A graphics workstation that usually handles multi-megabyte files performs best with a much larger block size than a transaction server that updates files in chunks of a few hundred bytes at a time.

This kind of performance tweaking requires considerable expertise on the part of vendors, IT professionals, and systems integrators. For RAID VARs, this can be an important source of added value.

Some analysts and vendors see the RAID market splitting into two distinct camps. On one hand are users who want RAID because of its high throughput. On the other hand are users looking for reliability, availability, and security.

Of course, commodity components mean that RAID manufacturers have to find other ways of differentiating their products. One such way is through storage management software. "The software is extremely high margin relative to hardware," says Lahive, "but the advantage to users is that the software decreases storage management time and costs."

Click here to enlarge image

Click here to enlarge image

Disk drives account for only a little more than half the failures of storage hardware, which is why RAID vendors are offering arrays with redundant everything, right down to power cords in some cases.

Click here to enlarge image

Mirroring, the simplest form of fault-tolerant RAID, is becoming increasingly sophisticated. Variations of mirroring include intramirroring, local, cross, remote, reflective, and replicated.

Trends in Low-End RAID

Not all the growth in RAID is at the high end of the market. Inexpensive RAID solutions are growing as well, pushing RAID into applications and onto desktops that until now were not cost-effective. At the low end, the combination of such features as RAID controllers on motherboards, more integrated chip sets, and software RAID is making redundant arrays more useful in departmental servers, and even on desktops.

At the low end, one of the driving forces for RAID is the absolute cost of storage. With 2GB IDE drives available for less than $200, a RAID array becomes economically attractive, even for desktops.

Although the lower end of RAID isn`t as exciting as terabyte-level databases with elaborate failover and management capabilities, this market is expanding quickly and there`s more room to grow. At the upper end of the server market, Disk/Trend estimates that about 90% of all servers use some kind of RAID. "In the midrange level, it`s probably around 60%," says Disk/Trend`s Robert Katzive. "In entry-level servers, probably 20% to 50% use RAID, and that`s where there`s room for improvement."

To that end, vendors are offering a number of different approaches to inexpensive RAID. For example, several motherboard manufacturers are using Adaptec`s RAIDport system to put the RAID controller directly on the motherboard. Some of these approaches even use low-cost IDE drives for the array.

RAID on the motherboard addresses a fundamental problem of price-sensitive RAID solutions: the cost of the controller. In inexpensive systems, the price of the controller is important because the cost of the array controller becomes a larger proportion of the overall cost as the cost of drives drops. Making RAID more attractive at the lower end requires cutting the cost of the controller as well as that of the drives.

One approach is to make RAID optional. A number of motherboard manufacturers, such as AMI, are using Adaptec`s RAIDport system, which combines Adaptec`s SCSI chip set on the motherboard with an inexpensive daughter card that upgrades the server to a RAID array. With RAIDport, Adaptec claims that OEMs can produce a low-cost server that can be upgraded to support a RAID array at a cost of about $300.

The cheapest way to install RAID in hardware is to build the controller into the motherboard. In the last year, vendors such as Mylex have started shipping motherboards with built-in RAID controllers.

According to Disk/Trend`s Katzive, it`s too early to judge the success of the RAID-on-motherboard concept. "It`s been what I`d call a moderate success so far," he says. "It hasn`t appeared in a big way, but it`s still early." He points out that the boards have only been available for about a year and that it takes time to implement the technology.

Not all inexpensive RAID controllers are based on the SCSI interface. Arco Computer Products, for example, makes an under-$200 RAID controller for IDE drives that supports RAID level-1 (mirroring) with no special drivers.

"We`ve got about 25 of the Arco units, and for a sub-$200 product, it offers a lot of value," says Richard Acerra, president of Lighthouse Computers, a Huntington, NY, systems integrator that supplies video control systems for cable television stations. In the Lighthouse systems, the Arco controllers run on Windows NT systems running high-end video software to control events, handle wipes and fades, and generally handle local programming for cable television operators.

Lighthouse customers also use tape backup systems, but Acerra points out there is a big difference between the time it takes to recover from tape and the time required to hot swap a failed drive in a mirrored array: "When you do a restore from tape, you may have a three-hour operation."

Acerra says that although Lighthouse employs the Arco controllers for speed, the concept of inexpensive IDE RAID has other applications. "In a workgroup file server, it`s an inexpensive way to get extra redundancy for failure protection," says Acerra.

Rick Cook is a freelance writer in Phoenix, AZ.

This article was originally published on January 01, 1998