Single-solution data protection

Using CommVault's Simpana, IT can deploy a single solution to integrate backup, archive, replication and SRM processes, leveraging data compression and deduplication across physical and virtual servers.

By Jack Fegreus, openBench Labs

While virtualization of resources can help IT establish more efficient operations, virtualization also introduces multiple levels of logical abstraction and redirection of storage resources that can obscure and complicate important IT operations. To help IT deal with critical backup recovery issues within a virtual infrastructure, CommVault's Simpana provides a Universal Virtual Server Agent (VSA) that can either be installed on a physical or virtual server running Windows.

To help reduce the impact of backups on physical hosts and virtual machines (VMs), the Simpana VSA simplifies common administration tasks with such features as VM auto-discovery, which discovers newly added VMs and automatically places them into existing backup policies.

Simpana also simplifies a major issue for every VM: two distinct personas. First, there's the IT-centric persona of an application running on a host. Second, there's the logical line-of-business persona of a VM as a standard computer. As a result, service recovery in a virtual operating environment is an extremely important issue, as the risk of a single host failing cascades to multiple VMs running multiple applications.

To resolve this issue, CommVault offers a suite of data management modules that leverage common services to provide enterprise-class data protection, archive, replication, search and e-Discovery. For backup and recovery processes, IT is able to control and monitor all end-to-end processes in a single pane of glass via the CommCell Console.

Within every CommCell configuration, a CommServe server coordinates all communication among CommCell components. The CommServe server also maintains a database of information relating to the CommCell configuration. One or more media servers, also known as CommVault MediaAgents, manage external storage resources for the CommCell. By adding multiple servers to run the MediaAgents, IT administrators are able to enhance both scalability and availability of storage services.

From a single backup job, administrators can restore a Windows-based VM as either a set of VMware container files or a set of Windows system files. Simpana can also be deployed in a number of ways, including as an agent inside a VM or on a proxy server with the Universal Virtual Server Agent for off-host backups of virtual servers and disk-level recovery.

To get a better perspective on Simpana's ability to support the adoption of a virtual operating environment, openBench Labs set up a data protection test scenario using VMware Virtual Infrastructure to host eight active VMs running Windows Server 2003. Each test VM was configured as an application server running SQL Server and IIS.

Intelligent Data Agents

Client Agents, also known as Intelligent DataAgents or iDAs, are software modules that provide various data protection services for client systems running a broad array of specific operating systems or applications. These agents are used to perform a wide range of data protection services. Backup and recovery processes are handled by a family of iDAs, which are content-aware and designed for specific types of data. Other Client Agents support such data protection services as replication, compliance archiving, storage resource management (SRM), content indexing, and search.

Using Simpana, IT administrators build end-to-end processes by linking a series of policies that include backend rules for how media servers work with storage resources and client-side rules that govern how content-aware agents interact with media servers.

For this analysis, openBench Labs explicitly focused on backup and recovery processes. In particular, we built four end-to-end backup and recovery processes using CommVault's File System iDA for Windows and VSA configured to leverage policy-driven disk- and tape-based storage libraries with data deduplication.

Using the CommServe Console, we created a series of policies for both the iDA File System and iDA Virtual Server Agent. The policies were linked to policies for storage elements and policies for "subclients," which list the specific targets for backup. Options for iDataAgents include whether to implement software compression and where the hashing function for deduplication should run—on the client or the media server.

To secure and minimize data traffic between client systems and media servers, an iDA is able to compress and encrypt data on a client system before that data is transmitted to a MediaAgent. More importantly, the iDA can share deduplication processing, which is centered on a special data store that contains the signature associated with each unique block of data and reference pointers to the instances of each unique element. With shared deduplication processing, the client iDA generates the hash signatures for data blocks, compresses the data, and sends the compressed data along with the signatures to the MediaAgent. The MediaAgent identifies and stores unique hash signatures and corresponding reference pointers.

In addition to the tasks of compression and deduplication, the Simpana VSA must also handle access to the data on multiple VMs. When the VSA is installed within a virtual server, it enables a VM to act as a backup appliance on its VMware host. The VM leverages VMware's hot-add mode to mount and copy volume snapshots of logical disks that belong to the other VMs on the host. When deployed on a physical server, the VSA server can work in both SAN and LAN mode to access VM volume snapshots on multiple hosts. For our testing, we utilized the VSA on a Windows-based Proxy Server, which was also our media server, exclusively in SAN mode.

For VMware, VSA leverages the VMware Consolidated Backup (VCB) framework to quiesce VMs, create a VM snapshot, and make the snapshot available on the Proxy Server using the selected transport mode. VSS quiescing can also be enabled in the VMware tools to ensure that VSS-aware applications running within a VM, such as Active Directory, MS Exchange, or SQL Server, are in a consistent state before the VM snapshot is created.

Once the snapshot is created and made available, VSA backs up the VM data to the MediaAgent. If VSA and the MediaAgent are on the same server, as in our test scenario, the process is considered to be LAN-free backup. In addition, during the backup of Windows guest operating systems, VSA is able to track metadata to catalog and index individual files and folders inside the VM. This provides a single-pass backup for full VM recovery and granular recovery of files. From the perspective of a VM, backup overhead is limited to taking and rolling back a snapshot.

Total I/O throughput for eight VMs with the Simpana Virtual Server iDA was typically about 500MBps. From the perspective of our Xiotech storage server, the disk DellStor2 was used for the Simpana D2D library, to which all backup writes were directed. All snapshots of VM virtual disks were read from the ESX datastore disk, XioDS1. Finally, VCB_proxy_dell was the disk on the Simpana proxy server to which VM snapshots were written and from which backup data was read.

Storage policies

Backup began as a disk-to-tape (D2T) process, which made maximizing streaming I/O throughput and minimizing the time to complete a backup the driving issues. In a D2T environment, storage utilization was addressed with a static tape rotation plan designed to limit the number of physical tapes needed to retain backup images at a secure off-site storage location.

With the introduction of disk-to-disk (D2D) backup, however, a problem surfaced with backup retention policies: Storing seven years of backup images in a grandfather, father, son (GFS) rotation consumes much more storage space—25 times more—than the original data.

To help organizations deal with massive data growth and meet more aggressive retention and recovery service level agreements (SLAs), Simpana lets IT enhance data protection policies with deduplication to meet recovery SLAs from disk without utilizing excessive disk capacity. What's more, Simpana deduplication does not require specialized storage or appliances, and can be implemented using commodity disk and off-the-shelf servers. CommVault Simpana can also extend the deduplication tier to offline media for efficient and cheaper long-term retention. Backup images that have been created using a policy with deduplication can be stored on disk to support near-term restoration and then moved to tape without re-hydration of the data for long-term storage.

We began our assessment of Simpana's backup and restore capabilities using the File System iData Agent (iDA). This would serve as a baseline for analyzing functionality and performance with the Virtual Server Agent.

Data compression has long been a staple hardware function on tape drives to reduce the quantity of data written to tape. Simpana provides an option within iDA policies to apply software compression at either the client system or at the media server.

In our tests of Windows-based clients, the Simpana File System iDA typically compressed data at a 4-to-1 ratio, which meant the volume of data transferred to the media server and then to the storage library was just 25% of the original data set. Even more impressive were the storage utilization gains provided by adding data deduplication to a D2D backup combined with the ability to pass storage gains on to tape media.

Using policies invoking data compression on clients and data deduplication on the media server, we measured 20-to-1 reductions in the amount of storage required for file-system backups. We were able to backup 309.5GB of file data using just 15.3GB of storage capacity for unique data and pointers. What's more, there was no penalty when restoring jobs characterized by a 20-to-1 data deduplication ratio: Simpana proceeded to reconstruct and rewrite the original files back to disk at upwards of 300MB per second (480GB per hour).

We created a deduplication-enabled policy for D2D backups and associated that policy with File System iDA policies that applied data deduplication at the media server and data compression at the client -- a strategy that garnered a 20-to-1 reduction in storage utilization. In addition to a default library, D2DLibrary2, we created a special store for the signatures of each unique block of data that was backed up, along with pointer to each instance of that block in any stored backup. We also designated our tape library as a secondary silo for transferring deduplicated backups to tape for off-site storage.

For long-term retention and off-site storage, deduplication-enabled backups created with a File System iDA can be assigned a special silo tape library to which deduplicated backup images can be directly migrated without re-hydrating (re-expanding) the data. This gives off-site tape storage the same utilization ratios as local disk-based libraries with deduplicated File System iDA backups.

Making virtual universal

From a data-content perspective, each VM system disk in our virtual operating environment contained 4GB to 5GB of highly redundant data in the form of common OS and application files, along with about 7GB of "empty" free space. As a result, backup images from our test environment would present multiple layers of duplicate data.

Each VM was provisioned with a 12GB logical system disk, which was stored as a vmdk file within an ESX datastore and shared with the Simpana proxy server. Moreover, each VM utilized about 70% of its system disk, which means 30% of each vmdk file was blank.

Backup and restore procedures are simplified with physical servers, since each system is isolated by its physical host. In a virtual operating environment, the CPU processing load on one VM impacts the host server, and that in turn impacts processing on every other VM on that host. The IT strategy of running independent backup jobs in parallel may not scale for multiple VMs running on a single host server. What's more, a backup process must address more than just the logical persona of a VM as an instance of a system running a particular OS.

A complete data protection process must also address the physical persona of a VM as an application running on a host server. As an analog to bare-metal restore for physical servers, complete data protection for a VM requires the ability to restore a VM as a functioning application on a virtual server.

The Simpana VSA supports three backup modes for VMware environments. With Disk Level, VSA backs up the entire VM disk image and provides the capability to recover the full VM or, for Windows-based guests, individual files inside the VM. With Volume Level, VSA provides full VM protection for Windows-based guests, without the VCB overhead of additional cache space, as well as granular file-level recovery. Finally, with File Level, VSA backs up only the files inside the VM, which is useful when combined with Simpana's content-indexing capabilities.

When creating backup and restore policies for VSA, IT administrators can establish credentials with servers running ESX Server or vCenter Server. The latter option extends access to all ESX servers and objects, such as templates as well as VMs, under the control of the vCenter Server. Once a connection with an appropriate server is established, Simpana provides one-button VM discovery. The process identifies and configures all VMs that run in the configured scope as sub-client contents inside VSA.

We configured a Virtual Server iDA linked to our D2D storage policy that accessed our vCenter Server to gain account credentials for the VMs. We then set up a default sub-client policy that selected eight VMs for backup.

Most importantly, in the SAN mode tested by openBench Labs, the exchange of data between the ESX server and the Simpana proxy server was done entirely over a SAN: There was no LAN traffic. As a result, a VSA backup has virtually no impact on production processing with respect to the VM or its host. The proxy server copies a snapshot of the VM's data files to a local directory and then backs up that local directory. Once the proxy server finishes copying all of the VM's files to its local directory, the proxy server dismounts the snapshot files and the ESX Server removes the snapshot from the VM. From the VM's perspective, processing was interrupted only for the few seconds it took to execute and later roll back the snapshot.

The isolation of the ESX server from the Simpana proxy server running the Virtual Server iDA is highly significant for the data deduplication stage of our testing. Under the most favorable conditions, data deduplication is very resource intensive in terms of memory, disk, and CPU. When dealing with foreign resource objects such as ESX vmdk files, an already resource intensive process becomes even more so. In a virtual operating environment, the deduplication process must be configured to take a very fine-grain approach to the division of large virtual disk files into a maximal number of minimal small-block chunks, which must be compared with each other.

In our final testing, we added data compression and data deduplication to a VSA policy. In these tests, data compression was 2-to-1. With data compression and block deduplication set to a 32KB block size, our data deduplication storage policy provided an initial overall data reduction ratio of 8-to-1 in stored data. In particular, we utilized only 19.1GB to hold backup images for 151.5GB of VM virtual disk data.

Achieving this level of storage saving with VSA, I/O throughput was reduced by 90%. Nonetheless, the added processing exhibited no measurable impact on either the VMs or the ESX server. More importantly, there was no penalty on job restorations, which proceeded to reconstruct and write the original VMware or Windows files back to disk at upwards of 300MB per second (480GB per hour). Through flexible data compression and deduplication options, we were able to configure end-to-end backup processes that maximized data throughput and minimized data storage for all types of data.

Backing up eight VMs in parallel with the Simpana VSA, total throughput on simultaneous reads and writes reached 500MBps. At the same time, CPU processing on each VM never exceeded 5% and memory usage never exceeded 10% on any VM.

Jack Fegreus is CTO of openBench Labs.


UNDER EXAMINATION: Unified backup software for virtual and physical environments

WHAT WE TESTED: CommVault Simpana 8.0


Dell 1900 PowerEdge server
-- Quad-core Xeon CPU
-- 4GB RAM
-- Windows Server 2008
-- Simpana 8.0
-- VMware Consolidated Backup (VCB)

Dell 1900 PowerEdge server
-- (2) quad-core Xeon CPUs
-- 8GB RAM

VMware ESX Server
-- (8) VM application servers
-- Windows Server 2003
-- SQL Server
-- ISS

Xiotech Emprise 5000 System

-- (2) 4Gbps Fibre Channel ports
-- (2) Managed Resource Controllers
-- MPIO support for Windows and VMware
-- (2) DataPacs

Spectra Logic T50e tape library

-- (2) LTO-4 tape drives
-- (2) 4Gbps FC ports


• Simpana guides IT administrators through the creation of simple, narrowly focused policies that are linked to form a complete end-to-end policy for data protection.
• The Simpana Virtual Server iDataAgent configures a proxy server that automatically initiates, mounts, and backs up VM snapshots to minimize backup overhead on VMs running on VMware ESX.
• From a single full VM backup, restore full VMs or individual files as a logical system or as an ESX application in VMFS.
• iDataAgents can compress backup data at the client or the media server to reduce LAN traffic and avoid I/O bottlenecks, and improve storage utilization of devices without hardware compression.
• Software compression reduced backup data traffic and storage requirements by a factor of 4-to-1 in tests using Windows file data, and 2-to-1 in tests of VMware VMs.
• Simpana software supports deduplication policies that allow administrators to create special data stores containing signatures of unique data blocks or objects in a disk-based library along with reference pointers to the unique data, which means backup jobs only add new data or pointers to existing data.
• Block-level data deduplication immediately generated reductions in storage requirements for file-based backups of 20-to-1, and 8-to-1 for multiple VMs hosted on a VMware ESX server.

This article was originally published on January 06, 2010