Benchmarking a DLT Library
A combination of DLT tape drives and a robotic tape library puts high-speed multi-terabyte backup into the enterprise
The modern enterprise data center has to back up and store an ever-increasing quantity of data from desktops and servers. As hard-drive sizes have dramatically increased, the need to store and back up terabyte quantities of data is becoming ever more common. The optimal solution is to couple multiple tape drives with a robotic library system capable of storing and retrieving hundreds, if not thousands, of tapes.
From autoloaders at the low end to libraries at the high end, Quantum`s Digital Linear Tape (DLT) technology has become a favorite storage mechanism for enterprises. A top-of-the-line DLT7000 tape drive offers high reliability, sustained throughput rated at 5 MBps (300MB per minute), and an uncompressed storage capacity of 35GB per cartridge. Typically, a compression ratio of 2:1 is quoted, which pegs peak throughput at 10MBps and increases tape storage capacity to 70GB. Tests at CTO Labs, however, peg a compression ratio of about 1.6:1 to be more realistic.
Assuming an average desktop PC has 4GB of disk space, each tape can store a full backup of 10 to 20 desktop computers. What do you do if you have several hundred--if not thousands--of desktop systems? For most sites, the answer is simple: Ignore the desktop systems and just back up the servers.
That may have worked in the past, but as business intelligence applications such as Plato and SQL Server move onto the corporate desktop that kind of benign neglect will no longer be acceptable. Neither will the use of simple autoloaders powered by a single tape drive.
Once you couple several DLT tape drives with a robotic system, you move backup-storage technology out of the department and plant it at the enterprise level. At CTO Labs we did just that, by coupling a small Emass AML/S robotic tape library with four DLT7000 tape drives to a four-way Dell 6100 SMP server replete with nStor hardware RAID controllers attached to 10,000rpm Seagate Cheetah disk drives. We fired up CA`s ARCserve with the Tape Library option and took it all out for a spin to see how this enterprise-class system works in an NT environment.
Perhaps a dozen vendors offer a large robotic tape library compatible with one tape drive type or another. For this test, we chose the AML/S from Emass. The AML/S came with four DLT tape drives and has a tape capacity of 158 cartridges.
Larger systems that Emass sells, like the AML/2, can be configured with up to 256 drives and serve up to 46,656 DLT tapes. The biggest AML/2 system has a capacity of 1,663TB (uncompressed). A number of government agencies maintain several of these units in-house.
An important characteristic of the AML/S and the other Emass systems is the ability to support mixed-media libraries. You can configure Emass libraries with various drives that support such tape formats as DLT 7000, IBM 3590 Magstar, 8mm AIT, 4mm DAT, VHS, and BetaCAM.
These libraries also support optical disks and CD-ROMs. Emass supplies the housing, robotics, and firmware that make this system work and OEMs the different drives, configured as your enterprise requires.
The AML/S contains a laser reader on its robotic arm that sweeps the frames, columns, and tape drives to locate any tapes in the library and read the bar codes that identify the tapes. Using the firmware-based controls on the AML/S, systems administrators can move tapes from one slot or address to another and can even unload tapes to a tray that is accessible through a small side window.
This allows library operations to continue during the loading and unloading of tape cartridges. After the inventory, the AML/S firmware knows which tape is in which slot and communicates this information to any controlling software, like CA`s ARCserve.
Emass rates its library mechanism as capable of making 350 picks an hour, which is the equivalent of moving a tape every 10.3 seconds.
Our Test Bed
With four tape drives to keep busy, we connected the Emass library via an Adaptec ultra-wide differential SCSI controller to a four-way Dell PowerEdge 6100 server. With four 200MHz CPUs, the Dell server would not present a potential bottleneck from limited CPU cycles.
For disk I/O, we used an nStor CR8e RAID subsystem. The foundation of this subsystem is a dual-channel i960-based caching controller. We assigned four of the eight 10,000-rpm Seagate Cheetah drives to each of the two channels and then created two RAID-1 volumes on each channel. This gave us four very fast virtual disks from which to launch simultaneous backup operations.
We copied the same 2.55GB folder to each volume. Our intent was to run concurrent backups of these folders using ARCserve, one to each of the four DLT tape drives in the AML/S. We had no difficulty doing this once we had updated the ARCserve software for the Emass subsystem.
ARCserve and the Tape Library Option performed well with the AML/S system. It was able to initialize and format tapes, schedule and run backup jobs, and monitor backups without incident. ARCserve`s "wizard-like" interface makes these tasks easy to perform, and the software is easy to understand.
Tale of the Tape
To set up a baseline on I/O for our test system, we first ran the CTO Labs Disk Benchmark on each of the four virtual drives using a 64KB read size. With one drive, we were able to read data at a rate of 582MB per minute.
With two drives split over the two SCSI channels, we registered a throughput of 714MB per minute, a 23% increase. Finally, with four drives, we recorded a throughput rate of 869MB per minute, a 22% increase over the two-drive scenario. While those numbers don`t double, they consistently scale upwards and indicate that disk I/O was not a bottleneck in our test system.
We then turned to running backups. With a single job, we achieved a throughput of 359MB per minute, which is about 20% faster than the rated sustained throughput for uncompressed data. Our second backup paralleled our second disk benchmark.
For data sources we used one virtual disk on each SCSI channel. Total throughput rose to 530 MB per minute for an increase of 48%. Moving up to four drives, however, swamped our single SCSI controller for the Emass library. With four simultaneous backup jobs, throughput actually fell to 480MB per minute, as we were now no longer able to keep the drives streaming.
Clearly, daisy-chaining the four DLT7000 drives on one controller was the problem. For the best price/performance ratio, two drives per controller is the maximum.
We also monitored the instantaneous read rates of data off of the RAID arrays, to ascertain whether the limitation for data transfer was disk related or controller related.
When we observed the Performance Monitor while a single backup was in progress, we saw a peak throughput rate of about 8MBps. During two backups, we observed an overall rate of reads from both disks to be on the order of 9 to 10MBps.
To put these numbers in perspective, you might want to consider the price/performance in dollars per MB per minute for each of these scenarios. A four-drive system daisy-chained to one controller and operating at 484MB per minute would provide a price/performance ratio of $181 per MB per minute. If you were able to achieve throughputs of 359MB per minute by using four controllers, then your price/performance would improve significantly to $61 per MB per minute.
Furthermore, you should consider that in any 7 X 24 mission-critical application setting, you might have a narrow window to back up your enterprise`s data. If that window is eight hours, then the different rates would translate into being able to transfer 232GB vs. 689GB of data with an optimized four-drive DLT library.
NOTE: This article is reprinted with permission from BackOfficeCTO magazine, a sister publication of InfoStor. For more information or to subscribe, visit www.backofficemag.com
Product: AML/S robotic tape library
Price: $87,700, for 158 cartridge slots with four DLT 7000 drives
CA ARCserve with the Tape Library Option can correctly view and manipulate the tape drives (1) in the AML/S and recognizes all of the tapes in each slot (2).
The top graph shows the disk read activity on the two disks serving as sources of data being written to two DLT7000 tape drives. Total sustained throughput peaks around 9MBps (1). The bottom graph shows the higher level of throughput the same two drives provide when the CTO Labs Disk Benchmark is run against them. This time peak sustained throughput is approximately 10MBps (2).
We compared the average throughput for tape backup to one, two, and four DLT7000 drives in the AML/S against the performance of the CTO Disk-read benchmark to one, two, and four volumes in the nStor disk array.