DLT-S4 reveals ‘super-tape paradox’

With a native throughput rate that’s 20% slower than IBM’s LTO-3 drive, Quantum’s new DLT-S4 performs backup jobs 30% faster.

By Jack Fegreus

Quantum recently began shipments of fourth-generation SDLT tape drives and media, the DLT-S4 and DLTtape S4. Sporting some interesting manageability features and 2x the native capacity of LTO-3 tape drives, the DLT-S4 is ideally suited for a leading role in archival operations.

To examine potential performance of the DLT-S4, openBench Labs installed the tape drive via an Ultra320 SCSI interconnect on a high-end server featuring a PCI Express (PCIe) I/O architecture.

For comparison, we also installed an IBM LTO-3 tape drive, which inexplicably sports an Ultra160 SCSI interface. We used Appro’s XtremeServer for all of our tests. This 3U server has four AMD Opteron CPUs and nVidia’s nForce Professional 2200 and 2050 communications processors for PCIe. The nForce Professional 2200 also acts as a bridge for internal I/O connections, including Serial ATA (SATA), for which the server provides six hot-swap bays. We populated these bays with Western Digital’s 15K Raptor SATA drives: One was reserved for SuSE Linux 10 Professional and the remaining five were combined as a high-throughput, software-based RAID device.

Running our benchmark for disk throughput, oblDisk, the SATA array was capable of sustaining extraordinary levels of continuous throughput, averaging 263MBps on reads and 187MBps on writes. With the Appro XtremeServer’s internal SATA RAID array able to sustain 4x the native throughput rates of the tape drives we were testing, we did not have to stream data from multiple disks simultaneously to satisfy the input needs of Quantum’s DLT-S4 tape drive. As a result, we were able to log backup throughput-and hence, data compression-on a file-by-file basis.

We monitored I/O throughput for backup-and-restore operations running BakBone Software’s NetVault 7.4.

Compared to our benchmark data, the results were surprising.

Using 256KB data blocks, we measured native (uncompressed) throughput for the IBM LTO-3 drive at 76MBps and 61MBps for the Quantum DLT-S4. Nonetheless, on a backup operation throughput peaked at 137MBps using the IBM LTO-3 tape drive, but reached 175MBps using Quantum’s DLT-S4 drive. The DLT-S4 also outpaced the LTO-3 drive when running restore operations. That leaves quite a paradox: The DLT-S4 has a native throughput rate that is 20% slower than the LTO-3 drive, but completes a 10GB backup 30% more quickly and a restore 33% faster.

Throughput for the DLT-S4 tape drive rapidly converged on theoretical limits for 2x- and 3x-compressible data. In contrast, throughput for the LTO-3 drive never converged on its 2x limit. Based on the backup tests, throughput translated into an average compression rate of just 1.3:1 for the LTO-3 and 2.1:1 for the DLT-S4.
Click here to enlarge image

Since the introduction of Quantum’s DLT 7000, all of our real-world file sets for backup testing have consistently demonstrated a narrow range for average compression-from 1.7:1 to 1.9:1. We leveraged that fact in our oblTape benchmark, with which we have been able to consistently and accurately determine the probable upper and lower bounds for backup-and-restore throughput. To make that determination, we streamed a mix of non-compressible random data with data that was calibrated to produce a 2:1 compression ratio using a limited set of characters in a frequency that was determined by using a normal distribution pattern. Invariably, backup-and-restore tests would match oblTape results within a narrow range around a mix of 80% 2x-compressible data and 20% non-compressible data.

In all of those tests, the determining factor was the tape drive’s attempt to compress non-compressible data. Compression schemes need to add metadata about the compressibility of the original data to reconstitute that data in a restore operation. Ideally, the total number of bits of metadata and compressed data are less than the number of bits in the original data. In that case, fewer bits will be written to tape than are read from disk: Throughput is calculated using the number of raw bits read and the time spent writing the compressed data. When the original data cannot be compressed, the metadata becomes pure overhead. Both the metadata and original data are written to tape and the perceived throughput rate drops to less than the drive’s native throughput specification.

Like LTO tape drives, Quantum’s DLT-S4 has electronics that can compare buffered data before and after compression without any perceptible impact on throughput performance. In effect, this scheme allows DLT-S4 and LTO-3 drives to turn data compression on and off inline. With both drives able to choose the optimal data set and eliminate the possibility of the drives slowing to less than their native throughput rate, the key area for differentiating these drives shifts to high-end throughput.

A number of strategies are available to accomplish that task. One is to increase linear tape speed. That strategy raises the native throughput rate of the drive, but it also increases the overhead of repositioning tape. Alternatively, today’s faster ASICs provide the opportunity to compress data more quickly, which is Quantum’s approach with the DLT-S4.

To test this new aspect of tape drive performance with our oblTape benchmark, we needed to make the compressibility of the patterned data generated by oblTape a variable with discrete values: 1x (non-compressible), 2x, and 3x. Once we associated a discrete input set with the values used to generate specific distributions of characters for the data pattern streamed to the tape drive, we were able to examine tape throughput as a function of both formatted block size and data compressibility.

With the advent of firmware that changes a drive’s tape speed based on data flow, the formatted size of data blocks directly affects uncompressed data throughput, which represents the effective native throughput rate. Using small data blocks increases the number of interrupts in the flow of data and triggers a slowing of the drive’s speed. A number of backup packages with versions for both Windows and Linux have a fixed block size of 64KB, the default maximum I/O size for Windows. As a result, the rate at which the drive’s electronics can ramp up throughput to its maximum potential vis à vis block size is very significant for performance on Microsoft Windows Server 2003.

Plotting the actual benchmark results provides a good macro view of relative drive throughput. Both the Quantum DLT-S4 and the IBM LTO-3 tape drives essentially reached their maximum native throughput rates with a block size of 64KB. With 2x compressible data, raw throughput writing data to tape was quite similar for both drives, with the LTO-3 typically holding a 5% edge. With 3x compressible data, however, raw throughput diverged dramatically. Interestingly, the Ultra160 SCSI connection never impinged on throughput for the IBM LTO-3 drive, which never reached its 2x-compression limit even with 3x-compressible data.

That throughput data was very consistent with the results of our backup-and-restore tests using BakBone’s NetVault 7.4 software and our standard 10GB backup test set. This set includes a mix of Microsoft Outlook, Access, HTML, JPEG, and GIF data files, which fall into a typical range for compression-from non-compressible to highly (3x) compressible. In those tests, we monitored throughput on a file-by-file basis. Peak file throughput using the DLT-S4 reached 175MBps and 137MBps using the LTO-3 drive, which approximated the oblTape results for both drives using 3x-compressible data. Average backup throughput using the DLT-S4 was pegged at 127MBps and 98MBps with the LTO-3 drive.

Insight into the differences between the drives really becomes evident, how-ever, when the raw benchmark data is normalized to native throughput. When the data is put in that form, a detailed micro view of how each drive handles data compression emerges.

With 2x- or 3x-compressible data, throughput for the DLT-S4 rapidly converges on its theoretical limits: 120MBps or 180MBps. Notably, 32KB transfers on the DLT-S4 with 2x-compressible data appear to be anomalous on the raw throughput graph as the DLT-S4 shows distinctly better throughput than the LTO-3 drive. When viewed on the normalized throughput graph, that data point is predictable rather than anomalous. In a normalized context, that data point is completely in line with the DLT-S4’s rapid convergence on its theoretical limit. In contrast, the LTO-3 drive increases throughput at a much slower rate and never approaches either its 2x- or 3x-throughput limit.

In addition to data throughput, the efficiency of a tape drive’s data compression scheme also affects another important aspect of the drive’s value proposition: cartridge capacity. Since the IBM drive converges on a compression rate of only 1.75:1 with data that is 2x-compressible using the DLT-S4, an IBM LTO-3 cartridge will at best hold 700GB rather than 800GB of data, compared to 1,600GB for a DLTtape S4 cartridge.

The first generation of LTO Ultrium drives introduced a hybrid scheme of digital and analog electronics to give LTO drives an edge in throughput performance over early SDLT drives. As a result, the market share of LTO drives grew rapidly. Nonetheless, advances in both the electronics and mechanics of tape drives now shift that performance edge to the pure digital circuitry of the DLT-S4 drive. In addition, the edge in compression performance translates directly into an important edge in cartridge capacity.

The importance of software

Complicating tape library comparison is the tight coupling of features with software. Library functionality depends on software just as much as it does on the robotics hardware. For most sites, this software will come as embedded Web/Java utilities resident on the device or as modules that are integrated into the backup-and-archival software.

We ran backup-and-restore operations using BakBone?s NetVault 7.4 software on a 10GB backup test set that includes a mix of Microsoft Outlook, Access, HTML, JPEG, and GIF data files. The compressibility of these files falls into a range that typically runs from non-compressible to highly (3x) compressible. Average data compression on a backup for the entire test set is typically 1.8:1. Using the IBM LTO-3 drive, we estimated average data compression to be 1.3:1; however, using the DLT-S4 we calculated average data compression to be 2.1:1.
Click here to enlarge image

For DLT-S4 drives, this opens the door to further distinguish the reliability of libraries using the drive via Quantum’s DLTSage software.

Storage administrators’ notions about reliability, however, are rapidly growing beyond simple media readability and data reproducibility. For larger and more-sophisticated IT sites, reliability is beginning to encompass the wider notion of media integrity. In a growing number of instances, restoring backup data is often no longer sufficient: Legislative mandates on data integrity are making it necessary to demonstrate that the restored data is accurate. In this area, requirements call for a compliant storage medium to support integrity protection, accessibility, duplication, and auditing.

Naturally, IT wants any solution to integrate into existing infrastructure. Using DLT drives, sites can easily extend traditional backup operations to secure archival operations. The DLTSage suite includes DLTSage WORM, which exploits the firmware in DLT drives to convert standard DLTtape cartridges into WORM-compliant media for archival operations.

The DLTSage Dashboard presents a high-level summary of the state of the tape media in a DLTtape cartridge, which includes the amount of tape that remains free. The Dashboard is the means by which electronic security keys are applied to devices and cartridges. Once a security key is assigned to a DLTtape S4 cartridge, its data will only be accessible on a device that has a matching key.
Click here to enlarge image

Creating a WORM cartridge involves writing a unique electronic key that cannot be altered on a standard DLTtape cartridge. This unique identifier creates a tamper-proof archive cartridge that meets stringent requirements for integrity protection while providing full accessibility for reliable duplication. The DLT-S4 builds on that electronic key mechanism to create a tape security system that will protect data in the event that the tape cartridge is lost or stolen.

The new DLTSage Tape Security System uses the notion of electronic keys to prevent unauthorized access to data on tape cartridges. Using a new simplified interface called the DLTSage Dashboard, an authorized user can add, change, and remove keys that lock and unlock cartridges with respect to a stand-alone drive or a library with multiple drives. In this way, the DLTSage Tape Security System provides a means to secure a tape from unauthorized access without the added overhead of encrypting and decrypting the data on a backup-or-restore operation.

As businesses become more global and more 24x7, they become more dependent on digital data. That has triggered an evolution in IT practices, which is changing the role of tape from an off-site backup medium to a nearline archive. This emphasizes the need for capacity over performance, shifts the focus of attention to secure self-managing devices, and leaves the DLT-S4 positioned to leverage all of these changes.

Jack Fegreus is technology director at Strategic Communications (www.stratcomm.com). He can be reached at jfegreus@stratcomm.com.

This article was originally published on May 01, 2006