Overland Storage’s REO 9500D is a high-performance, data de-duplicating virtual tape library packaged as a fully configured appliance.
By Jack Fegreus
With no hope of any near-term respite in the growth of storage, enterprise and mid-tier IT sites have embarked on a number of resource management strategies and tactics, which include storage and system consolidation, iSCSI expansion of Fibre Channel SAN fabrics, continuous data protection (CDP), point-in-time snapshots, and a shift away from tape to disk-based backup and recovery.
Prominent among the factors driving the shift from tape to disk are growing constraints on time windows for running backup processes and expanding regulatory requirements for having demonstrable data-recovery procedures in place. What’s more, given the impact of those IT operations on business continuity, it’s hardly surprising that many IT decision-makers insist that any changes be implemented with minimum disruption.
To meet those demands, Overland Storage has introduced the REO 9500D, a disk-based data de-duplicating virtual tape library (VTL) that is packaged as a ready-to-configure appliance. The REO 9500D is compatible with all major backup software packages and does not require the installation of software on backup servers.
With IT under the gun to move to a service-management model for governance, the last thing they want to do is initiate a wholesale replacement of existing backup software or introduce another point solution for data protection.
We ran our oblTape benchmark on a PowerEdge 1900 server running SuSE Linux Enterprise Server 10 SP1. For a virtual DLT 7000 drive on the REO 9500D, write throughput for data exhibiting a pattern for 2:1 compressibility ranged from 46MBps (64KB blocks) to 54MBps (128KB) and 68MBps (256KB). That level of performance is in line with that of an LTO-2 tape drive. Nonetheless, when we increased compressibility to 3:1, write throughput on the virtual DLT 7000 sky- rocketed to 62MBps (64KB blocks), 105MBp (128KB), and 147MBps (256KB).
Such point solutions need to be managed in isolation, which increases operating costs dramatically. The goal for IT is to optimize their existing environment transparently, and the most-effective way to accomplish that task is through systems, storage, and network virtualization.
From the PT Manager GUI, we were able to see our initial configuration of a single VTL, which we had dubbed oblVTL-1. This would appear to external servers, such as our PowerEdge backup server, as an ATL P3000 tape library with four DLT 7000 drives, 21 slots, and 19 100GB cartridges.
To meet resource optimization requirements, Overland’s REO 9500D incorporates a virtualization engine with a powerful data de-duplication mechanism that can provide storage optimization on the order of 25:1. More importantly, the 9500D comes pre-configured as an appliance that requires minimal configuration. Once we had given the REO 9500D a LAN address for the management GUI, we were able to run a backup.
In this first assessment, openBench Labs concentrated on the VTL appliance aspects of the 9500D and focused on the need to optimize backup-and-restore operations with minimal change to existing procedures.
From the Backup Exec GUI, we were able to see the logical presentation of our initial VTL configuration. From the perspective of Backup Exec, the PowerEdge 1900 server was connected to an ATL P3000 tape library with four DLT 7000 drives, 21 slots, and 19 100GB cartridges.
To examine the potential performance range of the REO 9500D in a mid-tier IT environment, openBench Labs installed the VTL in a 4Gbps SAN fabric anchored by a QLogic SANbox 9200 switch. To drive I/O through the 9500D, we employed a Dell PowerEdge 1900 server with an Intel quad-core Xeon processor and 4GB of RAM. For Fibre Channel connectivity, we installed a dual-port 4Gbps QLogic QLE 2462 host bus adapter (HBA). Finally, to keep our virtual tape drives streaming, we created multiple virtual work volumes on an IBM DS4100 disk array.
To test the throughput potential of the REO 9500D, we used our openBench Labs tape benchmark, oblTape. The benchmark generates two types of data: purely random, and patterned for compressibility. The patterned data is generated from a fixed set of characters in a normal distribution, which can be set to provide either a 2:1 or 3:1 compression ratio on a tape drive using the Digital Liv Zempel (DLZ) algorithm. All data is streamed directly to the device from memory to avoid any bandwidth issues.
Data can be streamed in block sizes of 2nKB, where n ranges from 0 to 8. This simulates the differences in the way backup applications write data to a tape drive. In particular, many midrange packages that run on multiple operating systems only allow 64KB tape data blocks- the maximum default size for Windows-for compatibility. On the other hand, enterprise-class backup applications often provide for writing data in 128KB and 256KB blocks.
Running our oblTape benchmark, characteristic throughput of virtual DLT 7000 tape drives on the REO 9500D was very similar to that of the Quantum DLT-S4, the latest-generation SuperDLT drive. For both virtual DLT 7000 and DLT-S4 drives, data compressibility is by far the most dominant factor in predicting the potential rate of throughput. In particular, as data blocks increase in size, the rate at which throughput increases for both drives is strongly dependent on the compressibility of the data within those blocks.
The QLogic SAN Fabric Suite provided a much more granular view of I/O traffic to and from the REO 9500D VTL. During a backup, I/O throughput rates ranged from 30MBps to around 120MBps. There was even greater variation in throughput during a restore. The high level of throughput observed on verification after backup highlights the need for fast disk I/O to keep pace with, and not bottleneck I/O from, the REO 9500D.
This is particularly true for data blocks with a critical size of 64KB, 128KB, or 256KB. Those are the nominal block sizes used by backup software for Windows, Linux, and Unix, respectively. As a result, the rate at which a drive can ramp up throughput to its maximum potential vis à vis block size is very significant for performance. This is especially true for backup software on Windows Server 2003.
The size of the data blocks used by backup software also has an effect of data de-duplication effectiveness. As the backup software packages the data to be transmitted to a tape drive-whether real or virtual-the software also adds metadata within each tape block. Since the use of smaller tape blocks require that more blocks be employed, the overall ratio of metadata to user data increases with smaller blocks. In turn, that additional metadata will adversely impact any data de-duplication scheme, which compares sequences of undistinguished bytes looking for exact patterns.
To assess the primary concerns of many IT sites evaluating VTL technology for ease of management, scalability, recoverability, and performance, openBench Labs set up a common mid-tier corporate IT scenario for backup and recovery. We installed Windows Server 2003 R2 along with Symantec Backup Exec 11d on a Dell PowerEdge 1900 server. In addition, we installed the ProtecTIER management GUI, PT Manager, on a work- station used for storage and system administration.
Our most complex task during that installation was deciding whether to download Quantum DLT drivers from the Microsoft Windows Update site or use the Symantec DLT drivers supplied with Backup Exec to run the newly discovered DLT 7000 tape drives on our backup server. We chose to use the latter, which is Symantec’s preferred case for Backup Exec when the resource will not be shared with other applications.
From the perspective of the Fibre Channel switch, total I/O running four simultaneous backup jobs through only one of the REO 9500D’s Fibre Channel ports averaged 145MBps. Moreover, on the verification pass following a backup, throughput often reached wire speed for a 4Gbps connection: 320MBps.
More importantly, from the perspective of ongoing storage administration, we would only need to use PT Manager to configure additional VTL devices or to perform the tasks typically associated with a real tape library logically, including the provisioning, importing, and exporting of tape cartridges. System administrators continue to perform all library management tasks associated with backup-and-recovery operations, such as pooling drives and cartridges or associating library resources with backup policies, through their existing backup application. In our case, that was Symantec’s Backup Exec 11d.
To test the REO 9500D VTL’s performance, scalability, and recovery capabilities, openBench Labs performed a series of backup-and-restore operations using 30GB data sets, which were large enough to provide a statistically valid sample. Our backup data set contained a large mix of Microsoft Office data files-documents, spreadsheets, slide shows, e-mail folders, databases, and a mix of HTML and image files from Websites. All of the backup-and-restore tests were run with verification and hardware compression enabled through Backup Exec. Performance measurements were then taken and compared across Backup Exec-which provided a view of the logical ATL P3000 library; the QLogic Enterprise Fabric Suite, which measured I/O traffic through the SANbox 9200 port through which the REO 9500D was connected; and PT Manager, which provided a view of the VTL appliance.
We began our initial testing with a single backup job directed at one of the virtual DLT 7000 drives. Since Backup Exec strictly observes all Windows conventions, it employs 64KB data blocks when writing to tape. Based on the results of the initial oblTape benchmark tests, we expected our average backup throughput to be in a range of 45MBps to 75MBps. Using our test data, which contains many end-user work files generated by applications in Microsoft’s Office suite, Backup Exec pegged the average backup throughput rate at 68MBps.
Verification following a backup operation was a read-into-memory process that sped data along at an average rate of 106MBps. In contrast, data restoration extended that process by writing to disk, which served to highlight the dependency of the VTL appliance on fast disk I/O. For our single-job backup, throughput on a restore averaged 72MBps.
From the perspective of the REO 9500D appliance, we were able to assess the relative performance of the four drives during a four-job backup. Given the high degree of variability in the backup data sets seen in the single-job test, the performance of the four drives remained well balanced throughout the test.
We then followed up the single backup stream tests with multiple simultaneous streams to our four-drive library. It’s important to note that all of this testing was done using just one Fibre Channel port on the REO 9500D and one Fibre Channel port on the PowerEdge 1900 server. When we launched four simultaneous backup jobs from the Backup Exec console, throughput rapidly increased as each of the jobs quickly came online-each of the four cartridges must be loaded sequentially by the virtual robot arm. With all four jobs running in tandem, burst I/O traffic frequently hit peaks of 215MBps.
However, that level of throughput was easily eclipsed as each of the backup jobs was verified. During that process, peak throughput reached wire speed-320MBps- for the Fibre Channel port. At the end of the test, Backup Exec reported average throughput for the four backup jobs as 142MBps.
Nonetheless, there is another important factor for measuring the effectiveness of scaling and projecting the ability of the REO 9500D to continue to scale with even higher loads. That measure is how well the VTL balances the real-time performance of its four virtual tape drives. Despite all of the variability in each individual job, Backup Exec pegged the four average job rates as varying by less than 5%.
A significant component of IT operating costs goes into backup and disaster- recovery management. The general rule of thumb is that the cost of managing storage on a per-gigabyte basis is three to ten times greater than the capital cost. For IT, the bottom line for controlling storage-management costs then comes down to minimizing the time to manage while maximizing resource utilization.
The solution to that conundrum has been the virtualization of resources. Full realization of that bottom line, how- ever, has often fallen victim to the fact that SAN devices tend to become “siloed” by incompatible management software, which effectively partitions SAN management into device-centric technology silos.
Overland Storage’s REO 9500D avoids that trap by allowing existing backup packages to manage all of the business-centric issues of backup policies and disaster-recovery management. All that is left for the PT Manager GUI is to manage the virtual manual tasks associated with tape cartridge provisioning. Managing multiple tape libraries that can scale in a fashion that exceeds the capabilities of most mechanical devices becomes a virtual breeze for a hard-pressed IT operations staff. ?
Jack Fegreus is CTO of www.open bench.com. He can be reached at firstname.lastname@example.org.
OpenBench Labs scenario
Virtual tape library (VTL) appliance
WHAT WE TESTED
Overland Storage REO 9500D
- Appliance is pre-configured with a RAID-5 infrastructure featuring dual controllers and hot-swap drives to support the ProtecTIER tape library virtualization framework from Diligent Technologies.
- ProtecTIER virtualizes the RAID-5 storage pool as one to twelve virtual ATL P3000 tape libraries with up to 64 virtual DLT7000 drives.
- The ProtecTIER HyperFactor data de-duplication process does not affect backup performance or integrity, as HyperFactor finds common data using a highly efficient inline process.
- VTLs exported by the appliance appear to existing IT backup applications as standard ATL libraries with DLT 7000 drives.
HOW WE TESTED
- Dell PowerEdge 1900 server
- Windows 2003 Server SP2
- PT Manager
- QLogic Enterprise Fabric Suite 2007
- Symantec Backup Exec 11d
- SuSE Linux Enterprise Server 10 SP1
- QLogic SANbox 9200 Fibre Channel switch
- IBM DS4100 disk array
- On performance benchmarks with 2:1 and 3:1 compressible data, the throughput of the virtual DLT7000 more closely approximated that of the DLT-S4, which is the latest generation of SuperDLT technology.
- Using Symantec’s Backup Exec, average backup throughput was about 68MBps for a single stream and scaled to an average of 145MBps with four streams.
- Using Backup Exec, average restore throughput was about 72MBps for a single stream and scaled to an average of 185MBps-reaching a peak of 320MBps-with four streams.