Automating D2D backup of virtual machines

With NetBackup 6.5, IT is able to address the issues of data growth and storage resource utilization in a virtual operating environment via policy-driven data protection processes that reduce operator errors and promote automation.

By Jack Fegreus

In a virtual operating environment, IT administrators are immediately confronted with the question: What should be backed up? Should administrators concentrate their efforts on the logically exposed virtual machines (VMs) running important business applications. Or should they focus on the applications and files that create those logical VMs.

At the heart of this problem is the fact that a VM has two distinct personas. First, there is the IT-centric persona of a virtual operating environment application that needs to run on a virtual operating environment. Second, there is the logical line-of-business persona of a VM as a standard computer system.

To resolve that dichotomy, Symantec's NetBackup integrates deep into VMware infrastructure to leverage the VMware Consolidated Backup (VCB) API, the vStorage API, and vCenter Server. Using NetBackup, IT administrators have the ability to dynamically restore the backup set of a VM running a version of Windows Server to either its native ESX Server state—using vmdk and other VMFS files—or as a logical Windows system with NTFS-formatted files.

More importantly, all NetBackup data protection processes fit perfectly into any IT service management initiative. The goal of service management is to automate the standard tasks of systems and storage administrators by building upon classic quality control (QC) practices for process management. To give IT a vital edge in Service Level Agreement (SLA) compliance, NetBackup provides a completely hardware-independent policy-based backup and restore framework to implement backup as an end-to-end automated process, which extends to the NetBackup PureDisk data deduplication environment. NetBackup even provides for quality control and process improvement via reporting tools that allow IT administrators to define and monitor service level compliance for SLAs entered into with line of business managers.

Triple play: Backup, store, dedupe
For a better perspective on the ability of NetBackup to enhance critical virtual operating environment and IT service management initiatives, openBench Labs set up a data protection test scenario using VMware Virtual Infrastructure. We focused our tests on eight hosted VMs that were configured as application servers running Windows Server 2003 along with SQL Server and IIS.

IT backup loads are dependent on a number of factors, including data retention requirements and the nature of the data in terms of compressibility and redundancy. This makes a virtual operating environment the perfect microcosm to examine all factors impacting backup load processing. Of particular importance is the growing practice of IT to use multiple VMs to establish the system availability and scalability that is characteristic of a large data center. This practice of utilizing multiple VMs, each dedicated to running a particular application, generates a prodigious amount of duplicate data.

To support all of NetBackup's data protection services, including data deduplication via NetBackup PureDisk, openBench Labs configured three servers. The first server ran Windows Server 2003 and functioned as the NetBackup Enterprise Master Server. The Master Server maintains the NetBackup catalog of internal databases, handles the creation of data protection policies, manages device and media selection, and can also be utilized as a media server.

Given the focus surrounding IT service management, it is important to note that NetBackup requires system/storage administrators to create policies in order to initiate and manage all data protection processes. From choosing media servers to enforcing life-cycle constraints on backup files, even ad hoc unscheduled backup and restore actions require an administrator to invoke a NetBackup policy.

To simplify the creation of a backup policy—or at least jump start the initial definition of a backup policy—NetBackup provides a wizard, which applies common defaults as it creates a policy. NetBackup offers IT administrators a powerful combination of policies and templates that allow them to quickly assign a storage lifecycle for backup data—from creation to expiration—on all storage resources across physical locations.

Under the NetBackup framework, when a data protection process needs to access a storage device, the enterprise master server assigns a media server to handle the task. The enterprise master server uses the selection of media servers to optimize storage resource utilization. To load balance and scale data protection, NetBackup 6.5 enables enterprise master servers to dynamically allocate the control of storage devices based on media server OS factors, such as CPU usage, memory usage, I/O load, and the number of active jobs. Our second server, which featured a quad-core CPU, was set up as a media server and also functioned as the VCB proxy server for backups of our VMware environment.

On our third NetBackup data protection server, we ran NetBackup PureDisk 6.5.2 in order to provide our test bed with a data deduplication service. NetBackup PureDisk runs its data deduplication process on the client system rather than the server. Client-side deduplication has benefits for both physical and virtual server backups that occur over a LAN because it lowers both network traffic and backup storage requirements.

With NetBackup, IT has two powerful data deduplication options for VM backups in a VMware environment. IT can either run a PureDisk client on a VM or run the deduplication process in the NetBackup media server with the NetBackup PureDisk Deduplication Option (PDDO). The latter option is particularly powerful when the media server also serves as the VCB proxy server.

From an IT operations perspective, backup administrators continue to work with the resource virtualization and policy-driven operations of NetBackup, while PDDO transparently adds a data deduplication service. What's more, PDDO adds a copy of the backup metadata sent to PureDisk to the NetBackup Catalog, which enables administrators to restore backup images processed with PDDO very quickly.

Data deduplication
For each VM in our test environment, we provisioned a 12GB logical system disk and a 25GB logical work disk. On each VM, the system disk was configured as a vmdk file stored within an ESX datastore.

The work disk on each VM was configured as a Raw Device Map (RDM) volume, which is a logical disk volume formatted by the VM's operating system as a native file system. That makes an RDM volume directly accessible by other physical, as well as virtual, systems that are running the same OS. The RDM volume remains manageable through VMFS, through the creation of an optional vmdk mapping file by the ESX server. While other backup packages require virtual compatibility to back up RDM volumes, NetBackup 6.5.4 is able to handle any RDM volume, whether or not it has been configured with a vmdk mapping file.

We kept all files for our eight VMs in a single ESX datastore. Within that datastore, the ESX server created a folder for each of the eight VMs. In particular, the folder for oblVM2 contained a 12GB vmdk file for the complete system disk and a 51MB vmdk file for the map of the RDM work disk. These and all of the files related to the state and configuration of oblVM2 need to be backed up in order to restore oblVM2 as a running VM. To satisfy normal business continuity practices, we also needed to be able to restore all of the files associated with oblVM2 and its two disks, WinSys and oblVM2_Work, in their native NTFS format.

From a data-content perspective, each VM system disk in our virtual operating environment contained 4GB to5GB of highly redundant data in the form of common OS and application files. On the other hand, each work disk contained 5GB to 6GB of relatively unique structured and unstructured work data. As a result, backup images from our test environment would present multiple layers of duplicate data.

Each system disk was represented by a physical 12GB vmdk file, which contained about 7GB of "empty" free space. In addition, the data in the used space of both the system and work disks contained a wide spectrum of data redundancy. That makes NetBackup deduplication an important tool to improve storage utilization of VM backups, especially image-level backups.

Storage options
Storage infrastructure supporting our virtual operating environment test bed included iSCSI and Fibre Channel (FC) disk and tape devices. To provide FC SAN storage, we set up a Xiotech Emprise 5000 array with two 4Gbps FC controllers. The Emprise 5000 supports active-active MPIO for both VMware ESX Server and Windows Server. This gives the array a significant advantage in throughput for applications that depend on streaming data, such as a VCB backup.

The linchpin in a VCB configuration is a Windows-based server that shares access to the VMFS datastores used to hold the vmdk files associated with VMs. Dubbed the VCB proxy server, this server uses a logical LUN driver to copy VM snapshot images from the VMFS datastore to a local directory on the server. The files in that local directory are the files that get backed up and archived. As a result, I/O throughput for that local directory has to be at least as fast as reading from the ESX datastore and writing to the backup device in order to avoid creating a bottleneck.

We also provided each NetBackup media server with SAN access to a Spectra Logic T50e tape library. The T50e was configured with two LTO4 drives, two 4Gbps FC controllers, and 50 tape cartridge slots. NetBackup simplifies managing the cartridge inventory and balancing I/O throughput for the T50e by managing all of the library's standard features. NetBackup is able to inventory the tape library and apply lifecycle management policies to the media. Moreover, through NetBackup's ability to load- balance multiple media servers and share tape drives among media servers, IT administrators have multiple mechanisms to balance the throughput loads on high-end multi-drive libraries.

VM backup automation
What makes storage infrastructure so important for a virtual operating environment is the fact that the CPU processing load on one VM impacts the host server, and that in turn impacts processing on every other VM running on that host. As a result, the standard IT strategy of running independent streaming backup jobs in parallel may not scale for multiple VMs that are running on a single host server.

VMware recognized the issue of backups impacting other VMs and introduced an API called VMware Consolidated Backup (VCB) for off-host backup. The VCB API allows backup applications to move a snapshot copy of a VM to an alternate host for processing backup operations in order to eliminate the impact of backup operations on the primary VM host server. To enhance seamless integration with VCB, NetBackup 6.5 indexes the contents of a virtual machine during a VM image-level backup—also known as a VMDK level backup.

For IT, these NetBackup capabilities are manifested in the backup and restore policy dubbed FlashBackup-Windows. Using the FlashBackup-Windows policy type, administrators are able to configure a backup policy that supports a full VM backup, which backs up all of the files associated with a VM, including the VMware system files. By backing up all VM files resident in an ESX datastore, administrators are able to restore a complete VM.

In addition, an IT administrator can also restore a VM in its logical OS-specific format from the same backup image. Using the drill-down capabilities of the NetBackup Restore module, an administrator can even search for and restore an individual file within a VM.

From a single backup image created with NetBackup's FlashBackup-Windows policy type, we were able to restore a VM to either its logical persona of a Windows 2003 Server with files in native NTFS format or to its physical persona as an ESX application in VMFS format. We could further control the restoration process by recovering the VM either its original ESX host or by staging the recovery on a local drive or the VCB proxy server.

To provide agent-less off-host backup, NetBackup integrates with VCB to make a media server a VCB proxy server and keep all data traffic on a SAN. The key VCB component for minimizing ESX sever involvement and keeping data traffic on a SAN is a VLUN driver, which the proxy server uses to mount and access a VMFS-formatted datastore. NetBackup also leverages VCB integration to establish account credentials with multiple servers running ESX Server, vSphere, or vCenter Server. By establishing credentials with vCenter Server, NetBackup gains access to all ESX servers that the vCenter Server manages.

The VM backup process starts with the proxy server sending the host ESX Server a VMsnap command to initiate a snapshot. This creates a point-in-time copy of a virtual disk on a VM. In particular, the ESX Server freezes the vmdk file associated with the VM drive.

Next, the ESX server sends a list of disk blocks for the frozen vmdk file to the proxy server. The proxy server then uses the block list with the VLUN driver to read the VM snapshot. As a result, a VCB-based backup has minimal impact on production processing. The proxy server copies the frozen vmdk file to a local directory and NetBackup backs up on that local directory without involving either the VM or the ESX host.

Once the proxy server finishes copying all of the frozen vmdk files to its local directory, the proxy server dismounts the vmdk files and the ESX Server removes the snapshot from the VM. From the perspective of the VM, processing was interrupted for only the few seconds it took as the ESX Server executed the snapshot and then when the ESX Server removed the snapshot and consolidated any changes that were made while the snapshot existed.

More importantly, when system administrators employ NetBackup, they never have to configure an external JavaScript file. Normally, all of the directives for VCB are hard coded in JavaScript files, which are edited outside of the backup application. Backup operators must run their backup package in conjunction with the JavaScript files to execute a backup of a VM. Worse yet, since the JavaScript is external to the backup application, there is no way for an IT administrator to know in advance whether the JavaScript is correctly configured without running a complete backup.

Through tight integration of VCB with NetBackup, administrators can configure end-to-end backup processes working entirely within the NetBackup GUI. There is no need to use the standard Java scripts that are provided with VCB and by competitive backup packages. We easily set up credentials for account authorization within NetBackup for vCenter Server and then utilized the NetBackup GUI to establish VM backup policies.

Thanks to Symantec's integration efforts, all interaction between NetBackup and VCB is established and managed internally through NetBackup policies, which provide a rich set of options to fine tune the process. What's more, when an IT administrator creates or edits any policy, including policies for backing up VMs using VCB, NetBackup interactively validates that the new policy is able to access client files and create and copy snapshots. As a result, NetBackup takes all of the guesswork out of the process of configuring a backup for a VM.

Backup, DR, D2D convergence
Media servers play an important role in the way that NetBackup creates a unified framework of end-to-end data protection processes in order to address the convergence of backup and disaster recovery. An important characteristic of that convergence is the use of disk storage as both a backup and a disaster recovery medium. Old-line strategies for data backup all shared one simple directive: Back up everything to tape.

Today's best practices, however, call for the implementation of new data protection approaches that utilize both disk-to-disk (D2D) and disk-to-tape (D2T) backup. NetBackup refers to all physical storage targets for backup images as a storage unit. Storage units can be a large variety of devices, including tape libraries, disk directories or disk pools, which can be assigned to one or more media servers. For D2D backups, NetBackup permits an unlimited number of disk storage units, which can be consolidated and virtualized into a smaller number of storage groups for simplified management.

Continuing the policy-management paradigm, NetBackup 6.5 introduced a new type of policy template called a storage lifecycle policy. A storage lifecycle policy can be applied to any number of backup policies, which significantly improves automation by creating a template that states where every copy of a backup image should be stored and for how long. A single policy can be created that specifies the lifecycle of backup data, including where all duplicate copies should be stored and when they should be expired.

To further refine the ability for IT to address the pain points associated with long-term backup retention, a NetBackup disk or storage pool can be assigned a staging policy. The staging policy provides for the automatic migration of backups through a storage hierarchy by creating a process in which a backup is written to a storage unit and later duplicated to a second storage unit. In a staging process, backup sets retain their original characteristics. When a backup operator restores a backup image, NetBackup automatically follows the pointers to the actual location.

In addition to ensuring consistency, IT can use storage policy options to tune backup performance, especially for a VMware environment. A key storage policy parameter for performance sets the maximum number of concurrent backup jobs for a storage device. For disk storage, this parameter is analogous to disk queue length. For tape libraries, this parameter defaults to the number of tape drives in a library.

 IT can use storage policies with staging to optimize both the performance and the costs associated with data protection processes. For mission-critical systems that require a minimal backup window, backup processing can be accelerated with multiple streaming D2D backup jobs that employ high-speed storage devices.

In our tests, we streamed VM backup jobs on fast SAN-based storage devices at 650MBps with one media server. Once the initial backup is complete, a staging policy can automatically move the backup images to less expensive media, such as tape or low-cost high-capacity disks, as a background task.

NetBackup will automatically attempt to decompose a backup process into the maximum number of concurrent jobs allowed by a storage policy. We set the maximum concurrent job parameter for each disk storage unit to support at least ten concurrent jobs. As a result, NetBackup was able to decompose any D2D backup that included all of the VMs running on our host ESX server into eight parallel backup processes.

In performing D2D backups of our virtual operating environment, in which the Emprise 5000 array provided all of the logical disks used by the ESX server and the media server, we ran full backups of our eight VMs at roughly 250MBps. In that test, total media server throughput was roughly 500MBps, as we needed to simultaneously read and write data to the same storage system. By adding a second array, we were able to split reads and writes across the two arrays and increase backup throughput to 650MBps, which meant we were pushing 1.3GBps of full-duplex I/O. What's more, the two phases of a VCB backup could be distinguished by monitoring the FC switch as the media server first copied eight VM snapshots in parallel to a local directory and then backed up that directory to a NetBackup storage unit.

There is a similar parameter that enables NetBackup to decompose a single client backup into multiple streams. By setting the maximum data streams per client, IT administrators enable NetBackup to decompose a single-client job that involves multiple volumes or directories into multiple parallel processes just as easily as a job from multiple clients.

This parameter is particularly important in combination with backup staging with NetBackup 6.5 in a virtual operating environment. The staging process is treated as a backup process and the VM backup images are readily identifiable by their originating client VM. As a result, a single staging process can be decomposed just as if there were multiple processes originating from multiple clients.

More importantly, these settings are made with respect to the distinct capabilities of each media server. This enables an IT administrator to provide NetBackup with all the information it needs to automatically tune and leverage the capabilities of a media server based on the real-time system load. What's more, the NetBackup master server can now load balance across media servers that are themselves automatically tuned. As a result, each media server will be utilized optimally and the entire NetBackup environment will scale to the maximum amount possible.

Dealing with data redundancy
A crucial technology facilitating the move to disk-based data protection schemes is efficient and reliable data deduplication. As backup needs increase, backup processes stress servers with more than high I/O bandwidth demands. From 1TB of primary disk storage, a traditional grandfather-father-son (GFS) retention plan for daily incremental and weekly full backups consumes about 25TB of archival storage. To deal with the storage provisioning issues of a D2D backup, IT can use the NetBackup PureDisk Deduplication Option (PDDO) to substantially reduce both storage and management requirements for storing data on disk instead of tape.

During a backup, the PureDisk deduplication engine on each media server segments all of the files in the backup image and replaces any duplicate segments with pointers to a single instance of the data. NetBackup then transfers a unique set of segments and metadata to the PureDisk server, which reduces LAN traffic to help optimize overall throughput.

The architecture of PDDO enhances the efficiency of both restore and backup operations on data from a deduplication storage target. In particular, PDDO leverages the NetBackup catalog to cache segment metadata in order to optimize deduplication and speed the restoration of data. More importantly, data segmentation and deduplication entirely changes the notion of a full backup. With inline data deduplication an integral part of the backup process, only the changed segments of a file are now transferred in a backup job.

In our testing scenario of a VMware host, our first objective with NetBackup was to minimize the time needed to run a backup of the eight VMs hosted on our ESX server. To meet this objective, we ran standard D2D backups of the eight VMs in parallel over our 8Gbps SAN at upwards of 650MBps—roughly 2.25TB per hour. While that was ideal for minimizing the backup window, it left our backup images stored on relatively expensive high-performance storage.

To alleviate that pain point, we extended the NetBackup storage policy for the SAN-based storage unit with a staging policy. The staging policy scheduled an automatic, resource-intense, data-deduplication process at an off-peak time on our media server to move a significantly smaller copy of the data to a cost-effective LAN-based PureDisk storage pool on a regular basis.

From the perspective of a NetBackup administrator, we were running a staging job that used the load balancing of NetBackup to launch four duplication processes: one process for each of the four CPU cores in our server. From the perspective of a PureDisk administrator, however, we were running four separate PDDO jobs with data deduplication.

As with the PureDisk OS-based agent, data deduplication rapidly improved with use. When we began running PDDO, initial media cache hit percentage was low. As we processed more backup files in staging operations, the cache hit percentage rose dramatically and our data storage savings ratio grew to 25-to-1. On a restore process, local metadata accelerated reassembly of files as the restoration of a deduplicated backup image ran at 65MBps.

More importantly, the data deduplication rate for each PDDO process was directly related to the caching of metadata in the NetBackup catalog. On the first VM backup process, the media server cache hit rate was 0% and that produced a data reduction ratio of 3.6-to-1 as we transferred 5.88GB of data to the PureDisk server for a backup job that scanned 21.16GB of raw data. After running a number of VM backup jobs, the cache hit rate rose dramatically. As cache hits began to exceed 90%, the data deduplication ratio approached 25-to-1.

Average aggregate I/O throughput for staging jobs with four sub-processes on our media server was 125MBps—the throughput limit for the Gigabit Ethernet connection between the NetBackup media server and the PureDisk server. That level of throughput validates the strategy of directly backing up the VMs on an ESX host as a D2D process, for which aggregate throughput reached 1,300MBps, and then staging deduplication of the data to a PureDisk pool.

What's more, media server caching also significantly benefited our ability to quickly restore VM backup images as either VM applications or logical servers in NTFS format. When we restored VMs from backups stored on our NetBackup PureDisk server, locally stored metadata enabled NetBackup to setup the directory structure of the desired VM before the data transfer process from NetBackup PureDisk began. We were able to find files more quickly because we could browse a VM backup image locally. In our test configuration, typical throughput when restoring deduplicated backup images was around 65MBps.

By load balancing data protection processes across media servers, NetBackup is able to provide process scalability limited only by hardware capabilities. In particular, IT can increase backup and deduplication throughput by adding media servers or increase throughput when restoring deduplicated data by adding PureDisk servers.

Jack Fegreus is CTO of openBench Labs.


UNDER EXAMINATION: Backup software for virtual operating environments

WHAT WE TESTED: Symantec NetBackup 6.5.4, NetBackup PureDisk 6.5.2


Dell 1900 PowerEdge servers
-- Quad-core Xeon CPU
-- 4GB RAM
-- Windows Server 2003
-- NetBackup v6.5.4
-- PureDisk Deduplication Option (PDDO)
-- VMware Consolidated Backup  (VCB)

Dell 1900 PowerEdge server
-- Quad-core Xeon CPU
-- 4GB RAM
-- Symantec PureDisk OS (Linux-based)

HP ProLiant DL580 server
-- Quad-processor Xeon CPU
-- 8GB RAM
-- VMware ESX Server
-- (8) VM application servers
-- Windows Server 2003
-- SQL Server
-- ISS

HP ProLiant DL360 server
-- Windows Server 2003
-- VMware vCenter Server

Xiotech Emprise 5000
-- (2) 4Gbps FC ports
-- (2) Managed Resource Controllers
-- MPIO support for Windows and VMware
-- (2) DataPacs

Spectra Logic T50e tape library
-- (2) LTO-4 tape drives
-- (2) 4Gbps FC ports


-- All NetBackup data protection processes are policy driven to reduce operator errors and promote automation, and can be configured using the Backup Policy Configuration Wizard.

-- NetBackup's configuration of federated media servers provides failover and load balancing using metrics such as memory and CPU consumption.

-- NetBackup integrates with VMware Consolidated Backup on a proxy server to automatically create snapshots on an ESX host, mount a local copy of the snapshot, and back up the local copy to keep VMs 100% available to users.

-- From a single full VM backup, restore full VMs or individual files as a logical system or as an ESX application in VMFS format:

-- Minimize D2D storage with the NetBackup PureDisk Deduplication Option, which  integrates with the NetBackup catalog to store unique data once in a PureDisk storage pool on D2D backups.

-- Data deduplication with PureDisk produced data deduplication rates of 25-to-1 on VM image backups over time.

This article was originally published on October 01, 2009