Here are a few issues to consider before you implement virtual tape libraries (VTLs).
By Dan Tanner
The idea behind a virtual tape library (VTL) is appealingly simple: Emulate slow, serial-access physical tape devices using high-performance random-access disk. The result should be identical appearance and procedures, but with higher performance and greater reliability.
But mappings between the physical and virtual worlds aren’t necessarily simple. Those mappings can become complicated when additional virtual capabilities and vendors’ various VTL implementations are taken into account. Keep two things in mind: 1) If you use a VTL, especially to augment or replace tape, you’re responsible for knowing all the important differences between the VTL (disk acting as tape) and the actual tape it’s replacing, and 2) fortunately, the Storage Networking Industry Association (SNIA) is working on standards for VTLs.
The physical tape world includes tape media, drives, autoloaders, and libraries, with a backup server, backup software, and media management software usually in the mix. VTL implementations must account for all of these, create virtual analogs for them, and make them behave the same way even though disk is being used in place of tape.
A recent InfoStor article pointed out the important distinctions among virtual tape (which virtualizes the media), virtual tape libraries (which emulate a tape library), and the ability to virtualize a tape library itself by dividing it into logical shares (see “Tape virtualization has many faces,”April 2006, p. 38). Consequently, a VTL may even be required to virtualize a physical tape library that has itself been virtualized!
Moving data and making copies are known technologies. The difficulty is integrating multiple copies of data into current backup applications that are aware of physical tape limitations.
And here’s a difficult potential tradeoff question of backup application licensing costs versus multiplexed tapes: If the backup software multi-streams backup to multiple drives (and media) in a tape library, should a VTL handle the operation by emulating the same number of drives or simply use one virtual drive?
A VTL can boost backup efficiency when the data rate is limited at the source (e.g., the file or application server) by enabling more streams and backup jobs to occur simultaneously, resulting in a shorter aggregate backup window. But there could be a price to pay. Suppose the VTL merges incremental backups and performs data migration, and that the backup software does not know about those operations. The SNIA VTL special interest group (SIG) is currently exploring a proposal that involves a standard application programming interface (API) that VTLs could use to inform the backup software of such operations and their status.
As tape increasingly becomes a disaster-recovery bulwark instead of the primary backup medium, there will be a natural tendency for tape users to want to cram as much data as possible onto each piece of media. This tendency will be heightened by the fact that tape media capacities typically double with each new generation. Although the cost per unit of storage declines with those innovations, the per-unit media cost rises.
End-user concern about media costs will thus require VTLs to support physical tape media that may contain a mix of compressed and uncompressed files belonging to file systems that on virtualized tape libraries had previously remained segregated. Compression is an inter-file method by which to reduce the amount of media required to store a given amount of data. Data reduction is an intra-file method. Compression uses algorithmic substitutions for bit patterns and can reduce the amount of required storage capacity by 50% (or more).
Data reduction can result in a 20x to 100x or more reduction in storage required and is implemented in various ways. The simplest approach is “single-instancing” entire files. Improved data reduction techniques may take a file building-block approach and single-instance portions of files. Or, the data reduction technique can involve applying hashing techniques borrowed from encryption technology to recognize and store only unique bit-string occurrences within files. All data reduction techniques can support the concept of content-aware storage, and VTL technology that supports data reduction could be extended to accommodate information lifecycle management (ILM) and document lifecycle management.
Just as data migration from a VTL to a physical tape library can be implemented in various ways, so can encryption, compression, and data reduction. Each can be implemented in a separate appliance, in the VTL, on hosts, or in the physical tape library. But encrypted data has very poor compression ratios, and if the same device both encrypts and compresses, the stream may be delivered “in the clear” (i.e., neither encrypted nor compressed).
VTL technology and standards will evolve to improve physical compatibility, include new features, and even address compliance and corporate governance rules. VTL cost, performance, and reliability benefits are apparent, and there are many players in the market (see box). It will be difficult to make a VTL selection that will correctly anticipate all the concerns noted above. But a cooperative effort on the part of vendors working through SNIA will benefit both users and vendors.
Dan Tanner is an independent analyst and consultant, and founder and principal of ProgresSmart (www.progressmart.com).