Reducing risk via lifecycle management for backup media

Posted on July 30, 2009

RssImageAltText

By Jack Fegreus, openBench Labs

As government regulations continue to expand into risk management with mandates requiring corporate officers to safeguard business data for many years beyond the traditional seven-year period, Spectra Logic is incorporating into its tape libraries advances in data protection, including encryption key management, media lifecycle management, and automated initiation of SpectraGuard Support that go beyond incremental innovations in tape density and performance.

For IT, the path to higher efficiency in turbulent economic times starts with consolidating resource management and optimizing resource utilization. In this process, many CIOs are discovering that they need to change the way IT views and classifies data. Once these changes in the perception of information and data are understood, the hype over the displacement of tape by disk crumbles like an Alka-Seltzer tablet under Niagara Falls.

Tactically, this realignment has a natural affinity with the adoption of IT Service Management (ITSM) and the use of Service Level Agreements (SLAs), which put application-centric requirements on IT devices. ITSM builds upon classic quality-control (QC) practices for process management and focuses on the standardization and automation of administrator tasks to achieve better operating cost control and improve productivity.

For data protection applications, streaming throughput for reads and writes is the commanding metric. Moreover, for cost-effective I/O streaming, especially when writing data, tape is nonpareil. As a result, IT organizations need to take full advantage of a multiple-tier storage infrastructure. What's more, taking advantage of storage infrastructure in an ITSM context requires significant automation and that makes a device such as the Spectra T50e tape library a good fit for the next-generation SMB data center.

IT must also reassess its historical view of data as an external resource that must be stored and valued only by the cost of storage. To align with business, IT must take a lifecycle management approach that views data as an internal asset that changes in value as processes -- both end-user and IT -- act upon it. Fueling the urgency of this reassessment, government regulations continue to expand into risk management with mandates requiring corporate officers to safeguard business data for longer and longer periods of time.

For IT, these new government regulations, along with technology advances such as the multidimensional analysis associated with a data warehouse, are driving the growth of a new class of static reference data and leading to the convergence of backup and disaster recovery technologies into a unified data protection regime. When those trends intersect with the growth of server virtualization, it creates a maelstrom of Homeric proportions as the risk of a single virtual operating environment (VOE) server failing cascades down to multiple virtual machines (VMs) running multiple critical applications. As a result, CIOs need to rethink how data is used, secured and managed, rather than just how data is stored.

Innovations in tape density and performance continue to ensure that tape plays a key role in any information lifecycle management hierarchy. Nonetheless, meeting information lifecycle management constructs requires complimentary advances in automated tape management to keep operating costs in check. Spectra Logic meets those needs by incorporating into its tape libraries features such as encryption key management, media lifecycle management, and automated initiation of SpectraGuard Support services.

As the volume of data falling under the scrutiny of regulatory demands grows, IT must also introduce its own innovations to ensure that demonstrable recovery procedures are in place. IT innovations, however, often come in the form of a two-edged sword. One such innovation is server virtualization and the growing adoption of VOEs.

Of particular importance is the growing practice of IT to use multiple VMs in order to establish the RAS equivalent of a large data center. That practice of utilizing multiple VMs, each of which is dedicated to running a particular application, makes the total amount of data stored within a VOE prodigious. More importantly, in a normal backup scheme, that data will grow by a factor of 25-to-1 within a backup archive using a traditional grandfather-father-son backup scheme. That's more than 16 LTO-4 tape cartridges for every TB of active on-line data. As a result, a VOE is a perfect microcosm for examining all factors that impact backup load processing, including data retention requirements and the nature of the data in terms of compressibility and redundancy.

Real and virtual unification

In a VOE, each VM has two distinct personas. First, there is the IT-centric persona of an application that needs to run on a VOE server. Second, there is the logical business persona of a VM as a standard computer system. That dichotomy in VM perception has the potential for IT operational disruption.

To resolve those perceptual issues, VMware Consolidated Backup (VCB) integrates with commercial backup software to solve this problem for VMs running a Windows-based OS. For our tests, we employed Symantec NetBackup (NBU) version 6.5.3, which adds patented technology to enhance integration with VCB. Using this technology, an IT administrator can restore a VM as either a collection of VFMS files, which represents the VM as an ESX Server application, or as a collection of NTFS files, which represents the VM as a Windows system.

The linchpin in a VCB configuration is a server running Windows that shares access to the VFMS datastores used to hold the vmdk files associated with a VM. VCB installs a logical LUN driver on that server, dubbed the VCB proxy. The VCB driver enables the proxy server to copy VM snapshot images from the VMFS datastore into a local directory on the server. The files in that local directory are the files that get backed up and archived. As a result, maximum data movement requires writing to that local directory has to be as fast as reading from the shared ESX datastore, and reading from that local directory to be as fast as writing to the backup device for the end-to-end data protection process.

For our tests, eight VMs comprised the backup load for our VOE. We provisioned each VM with a 12GB logical system disk and a 25GB logical work disk. For each VM, the system disk was configured as a vmdk file stored within an ESX datastore. From a data-content perspective, each VM system disk contained 4GB to 5 GB of OS and application files. From an IT application perspective, however, each system disk was represented by a physical 12GB vmdk file, which contained about 7GB of "empty" free space. More importantly, even though a backup image can be restored as either a logical system with NTFS files or an ESX application with VMFS files, the VM backup image is created from the larger ESX application persona, with all of its redundant data.

While the snapshot-based VCB process minimizes any impact on the ESX server and the VMs running on that server, the wall clock time associated with the process will be extensive. Worse yet, the amount of backup data this will generate will be even greater. This is precisely why a well-designed policy that focuses on the needs of data security and movement, along with data placement, is essential.

Optimal storage hierarchy

With just two LTO-4 drives in our Spectra T50e library, we began our data protection process with a disk-to-disk (D2D) backup to maximize throughput at 300MBps. It would take a library with four LTO-4 drives to move data more quickly. While D2D throughput made perfect sense for use in an initial D2D backup, the cost of multiple units for long-term storage of backup images was simply not sustainable.

NetBackup automatically generated a data stream of writes to each tape device in our T50e tape library. More importantly, the data being transferred was in the form of very large backup images. This maximized the streaming capabilities of both LTO-4 tape drives as well as the Emprise 5000 disk array. Data rates consistently ran at about 100MBps to each tape drive in the library simultaneously. This was while performing secure transfers with data encryption occurring on each tape cartridge.

To bolster our data protection process with a more cost-effective hierarchical storage infrastructure, we configured our NetBackup Server with SAN access to a Spectra Logic T50e tape library. We configured the T50e as a single library with two LTO-4 drives, two 4Gbps Fibre Channel controllers, and 50 tape cartridge slots. Alternatively, we could have used the T50e's support of internal soft partitions to virtualize its presentation to NetBackup as two logical one-drive libraries.

To maximally leverage our storage hierarchy, we utilized the staging function within NetBackup to automate the transfer of backup images from the local disk pool to the Spectra Logic T50e. As a result, we only needed to size a logical disk on the backup server to hold backup image data online for one week. While that amount of data could fit on one LTO-4 cartridge, it represented 15% of our Xiotech Emprise 5000 disk array's storage pool capacity.

Following best ITSM practices, we automated the transfer of backup images to occur transparently during periods of minimal activity on our backup server. Nonetheless, this tactic introduces an administrative vulnerability should the unattended data movement process fail. For efficient IT operations, the discovery of a failed backup in the midst of a high-processing period is a very unwelcome interruption.

To avoid that situation, NetBackup can be configured to send an email alert should a backup process fail. While that provides good protection for unattended backup processes, it is far from an optimal solution. At issue is the occurrence of a hardware problem and the time needed to resolve the cause. In the case of a relatively serious incident, resolution may involve overnight shipment of a replacement part. As a result, the elapsed time from sending the email, reading of the message, and an IT administrator making a physical inspection of the problem could negatively impact IT for several days. Worse yet, the underlying assumption in that scenario is that problems only occur when the backup software is in control of the process.

To provide full monitoring for the entire end-to-end data protection process, all Spectra Logic libraries provide an AutoSupport function to monitor component failure thresholds and critical alerts, including motion restarts, power supply failures, tape drive failures, and library controller initialization failures. When one of these events occurs, IT administrators can configure the library's AutoSupport function to send out email messages with details about the incident.

Spectra Logic's AutoSupport function, however, goes beyond simply sending messages and log files to system/storage administrators. It can also be configured to send an email request to open or update a support ticket with SpectraGuard Support. The support ticket request message will include what Spectra calls an AutoSupport Log or ASL file. The ASL file includes the configuration of the Library Control Module, firmware information, system status and trace logs, the support contract number for the library, and contact information for system/storage administrators at the customer data center. As a result, support technicians have all the information that they need to start resolving the problem, without an IT administrator having to collect data or even make a phone call.

For devices that are designed to automate critical IT services in an unattended manner, Spectra Logic provides a needed standard feature: We easily configured our Spectra library to automatically send an email and open a support ticket with SpectraGuard Support.

Keeping information secure

Managing the cartridge inventory and optimally balancing I/O throughput manually for the Spectra Logic T50e tape library could add significant overhead for an already burdened IT staff. The library, however, simplifies these tasks through seamless integration with NetBackup. As a result, NetBackup is able to co-opt the management of all of the library's standard tasks and features.

NetBackup is able to inventory each T50e partition as a virtual tape library (VTL) and apply unique lifecycle management policies to the media in each partition. For our testing, we used just one partition to maximize throughput using both LTO-4 drives. Nonetheless, two very critical features were beyond the scope of NetBackup's management.

The first feature is the automation of hardware-based encryption. With the T50e, Spectra Logic includes its standard level of data encryption, dubbed BlueScale Encryption Standard Edition, which utilizes the encryption chip on the LTO-4 drive. Encryption key management is handled by Spectra Logic's BlueScale Encryption Key Management software, which can be accessed through the library's touch screen interface, or by making a connection through the library's web GUI.

To ensure that encryption was taking place, openBench Labs ran a BlueScale Media Lifecycle Management inquiry immediately after a backup. Within the health status report, MLM detailed the status of encryption and the identity of the key needed to read the contents of the tape.

Through either interface, setting up encryption for an IT administrator is easy. An administrator enters a password, the BlueScale software generates a key, and then the administrator stores the key on a USB drive. That's all that is necessary to implement the strongest encryption that the federal government makes available: AES encryption using a 256-bit key.

Nonetheless, with the Standard Edition of BlueScale Encryption, there are two major limits: There can be only a single encryption key and a single encryption password on a library at a time. While more complex schemes with multiple keys are possible, those schemes require the Professional Edition of BlueScale Encryption.
 
For most sites, and the vast majority of SMB sites, the Standard Edition of BlueScale Encryption Key Management probably offers more security features than IT administrators would normally configure on their own. What's more, it's included with the T50e free of charge. The importance of these security features are underscored by burgeoning data-protection legislation, which includes Sarbanes-Oxley, the Gramm-Leach-Bliley Act, the USA Patriot Act, and (off shore) the European Union Data Protection Act. The intent of the Standard Edition is to provide sites with a means of securing data while it is transported to a remote site and stored there.

If setting up encryption is a trivial task, then running with the Standard Edition of BlueScale is incredibly simple. Data encryption can be set so that it is automatically enabled when the library starts. Other than making sure that a USB disk with the proper key is inserted into the library, no further action is required. Encryption is completely transparent to both backup operators and the NetBackup application. Tapes can be set streaming at 120MBps with no impact on either NetBackup or any IT server. Encryption only comes to the forefront when an unauthorized person attempts to read data from the tape without using a drive or library with the correct key. At that point, all tape read and write processes with that cartridge will fail.

With AES encryption that is easy to establish and transparent to run, the logical question is "how does an IT administrator know that it's working?" The answer to that comes in the form of Spectra Logic's other major tape library innovation: BlueScale Media Lifecycle Management (MLM). Integrated directly into the library, BlueScale MLM leverages the internal chip memory on LTO-4 and cleaning cartridges.

Using BlueScale MLM, we were able to monitor the status of all media in the library from either the library's front touch screen or Web GUI. In this manner we could easily detect tapes with deteriorating media characteristics. As a result, we could freeze tapes as read-only media within the NetBackup GUI before NetBackup tried to use the "unhealthy" media in a critical unattended data protection process that would likely fail.

Media health Rx

Every time a tape is loaded into a library with MLM, more than 30 tape media data points are collected and statistically analyzed to help proactively manage the tape's life. With BlueScale MLM, administrators can generate instant on-screen media tracking reports and summarize key characteristics of every MLM-compatible tape and cleaning cartridge in a library.

In addition to summary reports, IT administrators can generate detailed reports with information about compression ratios, load counts, write errors, and encryption status. With BlueScale MLM, administrators can identify tapes with high error rates or other problems. Within the context of NetBackup, tapes with high error rates can either be frozen, which sets a cartridge to read only until all of its contents expire and the tape is suspended from all use, or the tape can be suspended immediately.

Implementing BlueScale MLM requires Spectra Logic's packaged, bar-coded, Certified LTO-4 Media and MLM LTO cleaning cartridges. Spectra Logic writes baseline data on each cartridge, including its bar code and the date on which the media was certified. Then, throughout the life of the cartridge, usage data continually collects on the memory chip. Whenever a drive in a Spectra Logic tape library unloads a cartridge, the drive writes information about the tape to the cartridge memory chip. BlueScale MLM in turn uses the information in the cartridge memory chip to maintain a database of information about each cartridge in the library.

For the Cassandras of tape as a storage media, that can hardly be comforting news. While critics seize upon anecdotes of damaged tapes as proof of the inherent unreliability of tape as a medium, growing regulations are insisting that offline, offsite data be kept tamper-free. That means a backup solution is not complete without tape to provide the offline, offsite archiving that is fundamental to business continuity best practice.

What's more, the pivotal role of information lifecycle management makes tape automation a key element of any next-generation management infrastructure. Through the integration of device fault management, encryption with key management, and MLMa tools into its tape libraries, Spectra Logic offers the managed automation that is essential for a high return on investment.

Jack Fegreus is CTO of openBench Labs.

SIDEBAR:

OPENBENCH LABS SCENARIO

UNDER EXAMINATION: Tape library with integrated media management

WHAT WE TESTED

Spectra Logic T50e
-- BlueScale Encryption Key Management
-- Hardware-based AES-256 encryption
-- Media Lifecycle Management
-- (2) Dual-port 4Gbps FC controllers
-- (1) IBM LT0-4 tape drive per controller

HOW WE TESTED

Dell PowerEdge 1900 server
-- Quad-core Xeon CPU
-- Brocade 815 8Gbps HBA
-- Windows Server 2003
-- VMware Consolidated Backup
-- Veritas NetBackup v6.5.3

HP ProLiant DL560 server
-- Quad-processor Xeon CPU
-- 8GB RAM
-- VMware ESX Server
-- (8) VM application servers
-- Windows Server 2003
-- SQL Server
-- ISS

HP ProLiant DL360 server
-- Windows Server 2003
-- VMware vCenter Server
-- Quad-core Xeon

Brocade 300 8Gbps switch

Xiotech Emprise 5000 System
-- (2) 4Gbps ports
-- (2) DataPacs

KEY FINDINGS

-- T50e includes an integrated Media Lifecycle Management function that records more than 30 tape media metrics on the tape cartridge chip memory and statistically analyzes these metrics to rate the health of a tape cartridge.

-- T50e includes Standard Edition BlueScale Encryption to manage the creation and use of a library encryption key for data security using the embedded encryption engine on LTO-4 tape drives.

-- AutoSupport function provides administrators with a mechanism to automatically open or update a support ticket with SpectraGuard Support, whenever a critical event occurs in the library.

-- Throughput with encryption scaled linearly with the addition of LTO-4 tape drives, which include embedded dual-port 4Gbps FC controllers.


Comment and Contribute
(Maximum characters: 1200). You have
characters left.

InfoStor Article Categories:

SAN - Storage Area Network   Disk Arrays
NAS - Network Attached Storage   Storage Blogs
Storage Management   Archived Issues
Backup and Recovery   Data Storage Archives