Lab Review: Bridging the archive-backup divide

Posted on October 30, 2009

RssImageAltText

As long as backup remained a disk-to- tape process, IT operations could ignore the differences between data that needed to be archived and data that need to be backed up. However, with the introduction of government regulations for strict archiving of particular data along with new backup technologies such as disk-to-disk backup and data deduplication, IT must now bridge that divide with a flexible archiving solution that easily integrates into current IT procedures.

By Jack Fegreus

-- Forged by economic turmoil and intense pressure on profitability, today's business environment dictates organizations go beyond traditional standards of accountability in justifying executive decisions. Just when CIOs are under intense pressure to alleviate anything that fuels greater IT complexity and higher labor costs, the lion's share of the regulatory burdens to maintain and manage the critical information needed to avoid litigation is falling on the IT department. As a result, the new business environment is challenging many of IT's fundamental notions regarding backup and recovery processes.

Among the changes is a sharpening of the distinction between data archiving and data backup -- a distinction that IT has successfully blurred to simplify operations. Now the confluence of regulatory forces and internal demands for operational efficiency are driving a re-examination of data archiving and bringing data preservation systems that are capable of automating regulatory compliance into the limelight.

With the rapid growth in the amount of corporate data that must be stored, administered, and protected comes an equally rapid rise in the costs to manage it. Worse yet, current IT practices, such as the use of disk-to-disk (D2D) backup, compound the problems of storage resource utilization. D2D backups can increase online storage requirements by a factor of 25X if not countered by the introduction of other technologies, such as data compression and data deduplication.

This has IT asking the question: "How often are files and records that have not changed being backed up?" The answer to that question has profound implications for IT operations. This is especially true in a virtualization environment, where the proliferation of virtual servers adds more sources of duplicate and static data.

At the forefront of the answer are the well known sets of data that external regulatory bodies require corporations to freeze, preserve, and track for significantly long periods. As a result, most sites focus entirely on compliance data when looking to acquire an archiving solution. Compliance data, however, is only a fraction of the problem. The answer to our backup question also unveils growing sets of stable corporate data that encompass such sources as dimensions of data cubes used in business intelligence applications and business process templates used to support quality control techniques. What unifies both of these classes of data is that they both need to be archived rather than backed up.

Archive and backup systems have two distinct and complementary functions. Archiving focuses on managing the long-term retention of data in a way that guarantees the state of the data, while keeping the data continuously available for quick ad hoc access. On the other hand, backup systems focus on providing a recent copy of data that can be used to roll back the impact of human or machine failures or errors. Most importantly, the synergies between data archiving and data backup can be leveraged in concert to maximize the effectiveness and minimize the cost of the storage infrastructure.

To provide for both the rapid deployment and the rapid leveraging of data archiving, the ProStor InfiniVault is delivered as an integrated system that includes hierarchical storage and retention management software. In particular, the InfiniVault multi-tier storage system provides the access and performance benefits of online NAS storage with the economic benefits of offline storage via removable cartridges that utilize high-end laptop disk drives. What's more, the InfiniVault software presents its multi-tier storage hardware as a unified system, while it simplifies the retention and disposition of data for any length of time and to any legal or compliance requirement.

By implementing archiving with ProStor's InfiniVault, , IT can leverage the system's capabilities to enforce regulatory compliance on maintaining data securely, while reducing the amount of data on primary storage by upwards of 50%. That cut in primary storage directly translates into significant backup savings that start with an equal cut in the processing time for backups. What's more, for sites not implementing data deduplication, that will also yield an equal cut in storage media requirements for maintaining backup images. For sites implementing data deduplication, InfinVault will cut the prodigious processing associated with signature comparisons of data segments.

Archiving sans ILM
While the benefits of data archiving are clear, the adoption of data archiving has been constrained by its association with the complexity of information lifecycle management (ILM). The InfiniVault software breaks that traditional association by focusing on policy rules oriented toward data management tasks rather than ILM constructs.

What makes the break from ILM to data management so significant is that data management is clearly an IT task. IT is able to take full ownership of a InfiniVault system and immediately use it to automate compliance with any collection of regulations and optimize backup and recovery processes. On the other hand, ILM deals with IT governance issues that are enterprise-wide in scope, which translates into a long process to establish interdepartmental consensus.

Another distinguishing aspect of ProStor's InfiniVault is the ability to virtualize multiple RDX removable disks behind NAS-exposed archive vaults in an automatic thin provisioning scheme that presents all on-line vaults with a capacity of 2TB. The RDX drives utilize a mobile 2.5-inch SATA hard drive mounted in a shockproof cartridge. As a result, InfiniVault provides IT users with the performance of online storage at a price point comparable to offline storage media, such as tape and optical.

Using the InfiniVault management software, openBench Labs created a compliance vault dubbed IVWORM. This vault had two active copies in slots 1 and 2 of the RDU and was exported as a 2TB NAS volume. Within this vault, we then created multiple folders, to which we assigned different content retention periods.

ProStor's InfiniVault allows IT administrators to create two types of vaults for archiving data: a compliance vault, which must be created as being hardware WORM enforced, and a standard read/write vault for static data that does not fall under any regulatory restrictions. When an RDX cartridge is assigned to a compliance vault, the cartridge cannot be reused until the retention time expires on every file and no file remains stored on the cartridge. Files on an RDX cartridge in a read/write vault do not have strict retention time limits.

As files were ingested into each folder of our IVWORM vault, the appropriate retention period was automatically applied as a file protection. In addition, the InfiniVault software automatically indexed the content and attributes of each file. As a result, we were able to search for files based on content as well as attributes.

When creating a vault, administrators can assign the new vault up to four active copies for security. Each copy corresponds to a slot in the Removable Disk Unit (RDU). By characterizing vaults by slots, multiple RDX cartridges can be virtualized behind a vault copy and multiple cartridge sizes can be used independently of other copies. In this way, one copy can be created with lower capacity cartridges, which can then be maintained at a secure offsite location using a cartridge rotation scheme similar to tape. Meanwhile, a copy using high-capacity cartridges can be used to keep second-tier files online for users.

More importantly, all of the required interaction with InfiniVault involves standard data management tasks. Placing a legal hold on a file is done by changing a file protection attribute. Even content indexing is an automatic process. As files are ingested into a WORM vault, the InfiniVault management software builds an index that includes all meaningful words in each file support searching by the content as well as file attributes.

The same simplicity is provided to the business user. Each vault is exposed as a 2TB shared volume. Users always see all of the files stored in a vault via a mix of live files and file stubs, which point to RDX cartridges. If the user accesses a file that is not on the NAS volume, which is resident on a RAID volume exposed by the System Controller and not on the current RDX volume mounted in the RDU, a message will be sent to insert the correct cartridge. This makes it easy to support large simple read/write vaults to store less valuable historic data such as templates and reference materials derived from completed projects.

With simple NAS access to the InfiniVault, openBench Labs was easily able to access and utilize the IVWORM compliance vault from a VM.

NAS access to the InfiniVault also allowed us to run more sophisticated rules-based ILM software on VM servers to further leverage the InfiniVault. In particular, we installed the BridgeHead FileStore Data Migrator on a VM and established more sophisticated ILM rules for automating the collection and transfer of data into multiple vaults on the InfiniVault system.

Simple LAN access with InfiniVault also makes it easy for IT to implement archiving within a VMware Virtual Infrastructure environment. In addition to solving regulatory compliance issues, archiving with InfiniVault can make the backup process for VMs significantly more efficient when data deduplication is part of the backup process.

Virtual machines (VMs) need to be backed up in their native format so that they can be restored as either a logical server in its file system format or as a VM application on the host server. As a result, data deduplication needs to be done at a fine block level. In tests at openBench Labs, data deduplication typically increases the wall clock time for backup processes by an order of magnitude. As a result, archiving stable data used by VMs to ProStor's InfiniVault will significantly reduce their backup window.

Process perspective
Corporate executives think in terms of business processes and expect the services that support those processes to address issues such as availability, performance, security, and business continuity with specific agreed-upon support levels. For IT to create the policies and procedures needed to support such a service level agreement (SLA), IT must establish rigorous data center process control.

With the InfiniVault technology, IT is able to easily address the issues of data growth, resource utilization, and compliance with government regulations for protecting and archiving data. By introducing data management-oriented file archiving with InfiniVault, administrators are able to provide scalable data protection processes for physical and virtual clients that are policy driven and can be extended at any time to a full ILM environment.

Without the initial complexity of a full ILM environment, we were able to set up and begin using a ProStor InfiniVault for compliance archiving and for archiving stable common data in a matter of hours. From any physical or virtual system, we were then able to load data into a WORM vault for regulatory compliance or a read/write vault for archiving a single instance of common data at 64GB per hour. What's more, we were easily able to invoke all of the necessary compliance functions, from content searches to applying a legal hold, that IT would be called on to perform.

Archiving common data also provides a big performance advantage with respect to the growing use of D2D backup strategies. D2D backup regimes typically require 25 times the volume of the data being backed up to store all of the required daily, weekly, monthly and yearly backup images that are part of a typical backup schedule. As a result, archiving stable common data can take 25% to 50% of a site's data out of the schedule of backup jobs, which inevitably includes at least one full backup each week.

Many sites, however, now focus on data deduplication as the single magic bullet to stop data expansion on D2D backups. For standard files on physical systems, most deduplication schemes are able to produce sufficiently robust results to alleviate, if not eliminate, the problem. Nonetheless, data deduplication introduces significant processing overhead.

To simulate the effects of backing up static data, openBench Labs ran a series of D2D backups on the same backup set, which contained 62GB of data. We initially ran a standard D2D backup with data compression. In this test the average throughput was 451GB/hour, the elapsed time was under 10 minutes, and the compression ratio was 4 to 1. We then ran two backups with deduplication turned on. In the second pass, our backup of 62GB only required 5MB of metadata -- a deduplication ratio of 155-to-1 for what amounted to 62GB of purely static data. Nonetheless, the processing to reach this level extended the backup over an hour and dropped the throughput rate to just under 54GB/hour.

All data deduplication processes need to analyze and compare backup job images to decide if the data is already stored. Reaching a data deduplication ratio of 25 to 1, which is needed to fully counter the expansion of storage for a D2D backup scheme, requires breaking a backup image into segments, creating a unique signature for each segment, and then comparing all of the segments in a process that becomes more effective as the segments get smaller and more numerous. The use of small segments is absolutely critical in virtual server environments, in which the content of a backup image is a container file for a virtual disk that is likely fragmented with respect to the VM files.

In general, inline data deduplication extends the wall clock time of a backup job. In tests at openBench Labs, backing up VMs with data deduplication has extended backup processing time by an order of magnitude. Worse yet, disk activity during this process raises the power consumption of an array by upwards of two orders of magnitude driving up both power and cooling costs. What's more, the process of segmenting a backup image, generating signatures, and comparing with past signatures repeats on every full backup. As a result, locating static data outside of the VM?environment, while providing full access to the VMs, significantly improves processing efficiency for VM backups.

For the bottom line of IT operations, introducing data management-oriented archiving with InfiniVault resolves a number of critical data protection problems. First and foremost this is the issue of meeting external regulatory requirements for data preservation.

There is also a growing volume of stable reference data outside of the purview of government regulations that can benefit from the archiving capabilities of InfiniVault. This static data is typically needed by multiple users and needs to be protected; however, because the data is static, there is no need to have a fine timeline of backup images. While including this data in a full weekly backup provides excessive protection, there is no real penalty to pay when backing up data strictly to tape.

With the introduction of a D2D backup scheme, with or without data deduplication, backing up this data becomes highly inefficient. In this situation the line between data that needs to be archived and data that needs to be backed up becomes clear and distinct. At this point IT needs to take this new class of archival data out of the normal backup rotation and move it to a cost efficient storage alternative, such as the InfiniVault, in order to meet internal SLA requirements for backup processing efficiency.


OPENBENCH LABS SCENARIO

UNDER EXAMINATION: Data archiving appliance

WHAT WE TESTED: ProStor InfiniVault 30

HOW WE TESTED

Dell 1900 PowerEdge server
-- Quad-core Xeon CPU
-- 4GB RAM
-- Windows Server 2008
-- CommVault Simpana 8
-- VMware Consolidated Backup

Dell 1900 PowerEdge server
-- Quad-core Xeon CPU
-- 8GB RAM
-- VMware ESX Server

(8) VM application servers
-- Windows Server 2003
-- SQL Server
-- ISS

(1) VM application server
-- Windows Server 2003
-- BridgeHead FileStore Data Migrator for InfiniVault

Xiotech Emprise 5000 system
(2) 4Gbps FC ports
-- (2) Managed Resource Controllers
-- MPIO support for Windows and VMware
-- (2) DataPacs


KEY FINDINGS

-- InfiniVault console management software follows a practical data management paradigm for regulatory compliance and archiving stable data to improve backup performance without having to introduce a complex ILM paradigm.
-- InfiniVault allows multiple storage vaults to be characterized by business rules that define data retention, data immutability through hardware-enforced WORM, encryption of removeable media, and secure deletion of media when retention time expires.
-- InfiniVault management software integrates with third-party agents to allow IT organizations to deploy ILM constructs using third-party software, such as BridgeHead's FileStore Data Migrator for InfiniVault or CommVault's Simpana iDA Archiver.
-- Indexing of archived files stored on disk-based removable media speeds file retrieval.
-- InfiniVault supports full audit trail reporting for all archived files.
-- I/O throughput when copying data to a vault averaged 64GB per hour.


Comment and Contribute
(Maximum characters: 1200). You have
characters left.