Streaming Multiple VM Backups To Minimize RTO

To alleviate the friction between business processes and IT infrastructure inflexibility, savvy CIOs are utilizing virtual machines (VMs) in a Virtual Operating Environment (VOE), such as that created by VMware, to provide the reliability, availability and scalability of racks of servers, while reducing capital and operating expenses. Nonetheless, the road to nirvana has its complications. Virtualization dramatically increases I/O loads on servers, and the expansion of government regulations on risk management highlights the danger that the failure of a single host server will cascade to multiple VMs running multiple applications.

In this review, openBench Labs focused on the server I/O problems associated with the implementation of an end-to-end backup and recovery process within a VOE.

With servers hosting eight or more VMs, a Fibre Channel HBA can no longer be regarded as a simple commodity product. As the number of VMs sharing the HBAs in a host server increases, SAN fan-out becomes a server issue as well as a switch issue. In a VOE server, HBAs have to play the role of virtual switches for virtual fabrics created by virtual HBAs assigned via N_Port ID virtualization. To deal with this issue, Brocade HBAs employ a high-performance ASIC that supports 500,000 IOPS per port and incorporates an eight-lane Gen 2.0 PCIe interface for 40Gbps internal server throughput. This level of performance is particularly important for supporting the high-level SAN I/O throughput needed by end-to-end backup and recovery processes in a VOE.

To optimize resource utilization, sites typically run eight or more VMs on host servers that utilize multi-core processors. Dense VM configurations put significant stress on I/O throughput for host servers, which must virtualize all of the SAN hardware for multiple VMs.

A major issue for VOE backup is the significant I/O overhead incurred in a process that utilizes VMware Consolidated Backup (VCB). In a VCB-based backup, all data must be read and written twice: once to a local directory on a server dubbed the VCB proxy server, and then again to the backup media. With data being simultaneously read and written in both phases of a VCB-based backup, achieving optimal efficiency requires a VCB proxy server that can provide a very high level of I/O throughput.

What’s more, VCB can move all data over a SAN using just the HBAs installed in the VCB proxy server. That puts a premium on the ability to reach high I/O throughput levels without the need for manual tuning and intervention by system and storage administrators. With operating costs dominating capital costs for storage resources, any solution that requires significant manual configuration or tuning efforts cannot be cost effective.

Within that context, openBench Labs examined the Brocade 815 and 825 HBAs, which are also available directly through HP as the HP 82B PCIe single and dual port HBAs.

To provide a baseline for our VOE backup testing, we first installed three single-port 8Gbps HBAs — a Brocade 815, a QLogic QLE2560, and an Emulex LPe12000 — on a quad-core HP ProLiant DL360 server running Windows Server 2003 and the Intel Iometer benchmark. This server would later assume the role of our VCB proxy server. At the center of our 8Gbps test fabric, we configured a Brocade 300 switch.

To provide a complete VOE test environment, openBench Labs utilized three servers, along with a Sepaton S2100-ES2 virtual tape library (VTL) with two 4Gbps Fibre Channel ports, and a Xiotech Emprise 5000 storage system with two 4Gbps FC ports. We hosted eight VMs running Windows Server 2003 on a quad-processor HP ProLiant DL580 server running VMware ESX Server 3.5, and managed our VOE from an HP ProLiant DL360 server running VMware vCenter Server (Virtual Center) on Windows Server 2003.

To handle the end-to-end backup process, we installed Veritas NetBackup with VCB on a second quad-core HP ProLiant DL360 server running Windows Server 2003. VCB installs a virtual LUN driver on a Windows server, dubbed the VCB proxy. In a backup, VCB directs the ESX host to create a snapshot for each logical volume of a VM. The Windows server uses the virtual LUN driver to copy the snapshots into a local directory. As a result, the backup application is able to back up that local directory containing the copied snapshots and avoid any processing impact on either the VMs or the ESX server. This is why the VCB proxy server’s SAN connection is so important.

As the capabilities of devices advance exponentially, the number of applications able to leverage all of those capabilities becomes much smaller. The 8Gbps Brocade 815 and 825 are an example of this trend. These HBAs support the highest levels of performance for both bandwidth- and IOPS-intensive applications. The vast majority of standard business applications, however, fall into only one of those camps.

Virtualization and backup are everyday applications that require the highest level of I/O bandwidth; however, ultimate IOPS is well beyond their modest transaction processing needs. Moreover, current multi-core CPUs are very cost effective at hosting VMs; but these CPUs are simply unable to provide the I/O latency needed to exploit the ability of the Brocade ASIC to generate 500,000 IOPS.

For backup, the important application-centric metric is full duplex (simultaneous read and write) throughput. An 8Gbps HBA must be able to read data at 8Gbps and write data at 8Gbps at the same time. Equally important is the balance between read and write throughput. In a backup, the controlling factor is the slowest rate of the slowest device.

Using Iometer to generate multiple sequential read and write streams, we measured near wire-speed full-duplex throughput only with the Brocade’s 8Gbps HBAs. Total I/O throughput reached an average of 1,568MBps with sustained reads measured at 786MBps and sustained writes measured at 782MBps.

The read and write I/O rates of the QLogic QLE 2560 were in balance; however, total throughput was about 12% less than the level measured using the Brocade 815 HBA. In particular, throughput for reads was 688MBps and 676MBps for writes. Those results pegged potential throughput for backup at 675MBps, which is just slightly higher than the maximum throughput sustainable with our VOE test hardware.

When we tested the Emulex LPe12000, there was a significant deviation between read and write throughput rates that was more problematic than the difference in aggregate throughput. As a result, our benchmark projected that the potential throughput of our backup application in a configuration employing a single LPe12000 HBA would be limited to about 450MBps. Given those results, openBench Labs set out to explore how well our benchmarks projected actual performance in two standard backup scenarios.

In our first throughput test, we ran a backup of the eight VMs on the host ESX server to a disk storage pool on our backup server. NetBackup ran all VM backups in parallel. Of particular interest was the first phase of the backup process. In that phase, the HP DL360 server runs in its VCB proxy role. The server reads the directory of each VM on a shared ESX datastore and simultaneously writes any snapshot files to a local directory. This process consistently ran at 500MBps using the Brocade 815 and the QLogic QLE2560 HBAs.

In our second test of full-duplex throughput, we backed up the storage pool used in our first test to a VTL that had been configured with eight logical tape drives. For maximum throughput, NetBackup split that process into eight I/O streams. As our benchmark projected, the Brocade 815 HBA easily sustained an aggregate full-duplex throughput of 1300MBps, which was the I/O streaming limit of our test hardware. Given the results of our application-centric benchmark and application testing, the 8Gbps Brocade HBAs will help guarantee any SLA associated with business processes for multiple application-centric environments, including VOEs, Web 2.0 streaming of rich media, as well as backup and recovery applications.

Jack Fegreus is CTO of openBench Labs.


UNDER EXAMINATION: 8Gbps HBA full-duplex throughput


Brocade 815 8Gbps HBA and driver v1.1.0.6

Emulex LPe12000 8Gbps HBA and driver v5.2.10.7

QLogic QLE2580 8Gbps HBA and driver v9.1.7.18


(3) HP ProLiant servers

Dell 1900 PowerEdge servers

— HP DL360
— Quad-core Xeon CPU
— Windows Server 2003
— VMware Consolidated Backup (VCB)
— Veritas NetBackup v6.5.3

— HP DL580
— Quad-processor Xeon CPU
— VMware ESX Server

— (8) VM application servers
— Windows Server 2003
— SQL Server

— HP DL360
— Windows Server 2003
— VMware vCenter

— Brocade 300 8Gbps switch

— Texas Memory Systems RamSan 400
— (4) dual-port 4Gbps HBAs
— 32GB RAM

— Xiotech Emprise 5000 storage system
— (2) 4Gbps ports
— (2) DataPacs

— Sepaton S2100-ES2 virtual tape library (VTL)
— (2) 4Gbps ports
— (8) Logical tape drives configured


— Iometer benchmarks peg full-duplex streaming I/O at 1,568MBps
— VOE D2VTL backup application throughput for NetBackup measured at 1,350MBps
— VOE D2D backup application throughput for VCB proxy process measured at 500MBps

Running full backups of eight VMs in parallel, our Windows-based HP server used the VCB driver to read a shared ESX datastore in order to copy VM snapshot files to a local directory. With the Brocade 815 HBA installed in our proxy server, we measured full- duplex throughput at roughly 500MBps. Using the Emulex LPe12000 HBA with the Emulex driver, we consistently measured full-duplex throughput at around 360MBps.
When we backed up the VM backup image files, which were created in our first test, to a VTL with eight logical drives, the Brocade 815 HBA sustained full-duplex throughput os 1300MBps. In this process, we read data from the Emprise 5000 storage system at 650MBps and wrote backup images to the VTL at 650MBps in perfect balance.