Delivering iSCSI storage by policy

iQstor enhances the value proposition of its iSCSI disk array with policy-driven dynamic storage configurations.

By Jack Fegreus

The virtualization of systems through a Virtual Operating Environment (VOE) is now a top priority for many IT organizations. However, it is impossible to maximize the benefits of a VOE, such as VMware Virtual Infrastructure (VI), without first implementing SAN-based storage. Especially at SMB sites, too often there is no SAN for harnessing system and storage virtualization synergies. By leveraging existing Ethernet infrastructure and not burdening IT administrators with new infrastructure to manage, IT can immediately realize all the benefits of a SAN with iQstor's iQ2850 disk array.

Further simplifying SAN implementation, the VMware ESX file system (VMFS) handles logical disks belonging to VMs in a way that is analogous to CD-ROM image files. In this way, VMFS encapsulates a VM's files within a logical disk and eliminates the burden of ensuring exclusive ownership of logical disks in order to protect the integrity of the VM's file system. More importantly, advanced VI features, such as VMotion, are designed specifically to leverage shared storage to provide a VM with mobility for load balancing and disaster recovery. That explains why strong growth in server virtualization is spurring strong adoption of iSCSI as the least complicated way to migrate from direct-attached storage (DAS) to a SAN.

To maximize the value proposition of the iQ2850, iQstor utilizes enterprise-class Seagate Barracuda ES.2 SATA drives. What distinguishes these drives is the availability of SAS or SATA interface electronics for drives with all other components identical. With the same recording platters and the same firmware to reduce rotational vibration, the reliability of Barracuda ES.2 SATA and SAS drives are the same.

The iQ2850 also has fully redundant, hot-swappable components. In particular, iQstor puts a Fibre Channel interface on each of the SATA drives and connects them to one of two iSCSI controllers via one of two 4Gbps FC arbitrated loops, which allows the controllers to address up to 240 drives. As a result, the iQ2850 can support significant storage expansion without addition of JBOD expansion units, which for archival applications, such as disk-to-disk backup, is a cost-effective way to meet I/O traffic patterns and requirements.

Given the quality of the drives and the system's FC arbitrated loop architecture, we were not surprised to see our sequential I/O benchmarks on VMs reduced to a race to reach wire speed—about 112MBps—for iSCSI connections. With one active I/O process on one VM, I/O throughput for sequential reads using a VDisk from our RAID 1+0 storage pool was pegged at 80MBps. That throughput rate was nearly twice that of a VDisk from our RAID-5 pool, which was measured at 45MBps. With battery-backed cache in the controllers and local UPS devices for systems, testing with a typical write-back caching configuration put write performance on a par with read performance for VDisks from both pools.

More importantly, that sequential I/O performance was for a single process on a single VM. With two I/O processes utilizing two distinct drives, throughput for standard 8KB reads rose to 105MBps per second, and for large-block (64KB) reads throughput rose to 160MBps using our RAID 1+0 storage pool. At that point, however, the structure of the RAID arrays in the storage pool becomes subsidiary to the EXS server's ability to leverage the eight iSCSI connections provided by the iQ2850. With the iQ2850 iSCSI storage system, I/O scalability is much more likely to be a server scalability issue rather than a storage system issue.

The iQ2850 compounds its SAN value proposition with integrated software for policy-driven automation of storage management functions. The disk array's Web-based management features include volume manager-based storage virtualization, storage provisioning, capacity expansion, data migration, snapshots, mirroring, and remote replication.

With the iQ2850, IT administrators can deliver immediate storage services to support application-based SLAs. Using iQstor System Manager web interface,  IT administrators create a hierarchy of storage objects. The hierarchy starts with disks, then moves to RAID arrays, then builds storage pools. For host servers, storage pools are the source of logical VDisks—target volumes, mirrors, and snapshots. Within that hierarchy, storage pools and VDisks get the lion's share of attention from IT administrators.

The game changer for IT operations, however, is the manner in which iQstor creates and leverages dynamic storage configurations, such as storage pool expansion. IT administrators are able to invoke simple policies to automate common management tasks. That automation ensures that those common tasks will be done consistently and correctly. What's more, complex issues such as planning and allocating disk capacity for each storage tier can be made dynamic and self-resolving.

By defining a storage pool as a virtual space of storage blocks with a common underlying RAID level, iQstor System Manager is able to present IT administrators with a simple option to automate storage pool expansion. The automation policy checks for a user-defined minimum threshold of free blocks: If there are fewer free blocks than the threshold mandates, iQstor System Manager automatically configures an appropriate new RAID array and adds it to the storage pool. For sites that use snapshots to improve application availability, a processing disruption caused by the inability to allocate a snapshot is not a pretty picture.

Testing: One, two, three
With server virtualization being a major driver in iSCSI adoption, openBench Labs set up an iSCSI test scenario for the iQ2850 that centered on a typical Virtual Operating Environment. What's more, for a storage system, such as iQstor's iSCSI iQ2850, full support of a VOE tests all of the system's dimensions of functionality and performance.

Our VOE featured a VMware ESX server hosting eight VMs on a quad-processor HP DL580 server. We also employed a quad-core Dell PowerEdge server running Windows Server 2003 to manage the virtual infrastructure with vCenter Server (aka Virtual Center) and VMware Consolidated Backup (VCB) to create and share snapshots of VMs during a backup. To manage the actual end-to-end backup process we used NetBackup 6.5. For iSCSI connectivity on our servers, we used an embedded 1Gbps NIC on the Dell server and installed a dual-port iSCSI HBA, the QLogic QLA4052, on the HP server, dubbed VMzilla.

With ESX hosting eight I/0-intensive VMs, we chose a dual-port iSCSI HBA to leverage support for hardware iSCSI HBA Multipath I/O (MPIO). In particular, ESX compounded our ability to leverage the eight iQ2850 iSCSI ports by providing round-robin connection load balancing and active/passive path failover. As a result, when I/O processing involved multiple disks or multiple VMs, we were able to automatically maximize both the number of I/O requests that could be directed to our iQ2850 and the volume of data that could be returned to our VMs.

Nonetheless, I/O throughput is only a small part of a VOE scenario. Besides being a critical business support issue in a VOE, backup is a major technical support issue for IT. Best IT practices call for classic file-level backups of VMs to be augmented with image-level backups that can be moved among VOE servers to enhance business continuity. VMware Consolidated Backup (VCB) provides the framework needed to support both image-level and file-level backup operations; however, VCB also requires the sharing of logical disks on a SAN. To provide backup in our VOE test scenario, we installed VCB with Symantec NetBackup 6.5.3 on our Dell PowerEdge 1900 server. The VCB package includes a VLUN driver that is used to mount ESX snapshots of VM disk volumes.

To make this backup scheme work, the VCB software must be installed on a Windows Server host — dubbed the "proxy server" — that has SAN access to the datastore volume on the ESX server. The proxy server must be able to mount the ESX datastore and access all of the vmdk files associated with the disk volumes belonging to the VM being backed up.

With both the proxy server and the ESX server having access to the datastore and vmdk files via the SAN, the backup window, as seen from the viewpoint of the VM, only lasts the few seconds needed for the ESX Server to take and then remove a VMFS snapshot of the VM's vmdk file. During a backup, the proxy server initiates a VMFS snapshot to the ESX Server. The ESX server then creates a point-in-time copy of a VM's disk files; the vmdk file is frozen in a state that reflects the VM at the instance of the snapshot; and all new data for a VM is written to a special file dubbed a ‘delta disk file.'

Next, the ESX Server creates a snap ID and a block list of the frozen vmdk file. The snap ID and block list are sent to the VCB proxy server, which mounts the vmdk file as a read-only drive via the VLUN driver. In this way, the backup process accesses and moves data over the storage network rather than the LAN. At the end of a backup, the proxy server dismounts the vmdk file, while the ESX Server removes the snapshot from the VM and consolidates any delta disk data into the vmdk file.

Through a number of advanced SAN Service features, including Managed Snapshot Services and Volume Copy Services, the iQ2850 enhances both the functionality and the performance of a VMware ESX Server environment. To test these advanced features, we set up a number of VDisks using iQstor System Manager and exported them to our VOE using vCenter Server.

A VDisk is created by allocating block space from an iQ2850 virtualized storage pool with no regard for the storage arrays that make up that pool. This provides for a very dynamic storage environment. Among the many potential automated tasks that a system or storage administrator can leverage is the dynamic expansion of VDisks and storage pools via automated array creation from spare disk volumes.

Using iQstor System Manager, an administrator can invoke comprehensive VDisk mirroring functions provided by the iQstor Volume Copy Services (VCS). VCS creates a duplicate physical copy of a VDisk, which functions as an independent VDisk. Without affecting the performance of the original production volume, IT administrators can assign the mirror as a read-only volume to any other host.

In this way, iQstor mirrors provide an exact physical duplicate of source data for a number of IT functions and applications, including backup, decision support, and application testing and development, without incurring a costly production interruption. To help support such scenarios, VCS provides for splitting mirrors and then re-synchronizing the volume copy with any changes made to the original volume. Among our VDisks, we included a mirrored 500GB datastore that would be used to encapsulate a number of VMs.

When a VDisk is created, iQstor System Manager automatically follows a policy that prevents all iSCSI initiators from being able to access the target by masking the VDisk LUN. For a host system to gain access, an IT administrator must first grant access using the LUN Masking menu in the iQstor System Manger GUI. The administrator has the option to allow either open access for all systems or limited access to a specific list of hosts. Having created a VDisk for use as an ESX datastore, we next provided iSCSI access to two servers: our ESX server, and our backup server running Windows Server 2003, VCB, and NetBackup.

After creating our datastore, we began adding VMs into it. There are two ways for IT administrators to configure logical disk storage on a VM that involves ESX and VMFS. The most common method is to represent a VM disk drive using a vmdk file, which is analogous to an ISO CD-ROM image and is stored in an ESX datastore. We used this configuration for the OS drive of each VM. For application data, we employed raw device map (RDM) scenarios for all dedicated logical drives.

An RDM volume is physically formatted by the OS of the VM with a native file system such as NTFS or ext3. As a result, the volume can be easily accessed by other virtual or physical systems. For VMs, the host ESX Server actually handles the RDM volume through a VMFS mapping file that acts as a proxy for a raw device. The mapping file contains metadata that is used to manage and redirect all disk accesses directed at the physical device. The RDM file effectively acts as a symbolic link from VMFS to the raw LUN in order to balance manageability via VMFS with raw device access via the VM's OS.

In a production VOE underpinned by a feature-rich SAN, IT administrators implement RDM-configured disks in order to offload the overhead processing of storage features from the ESX Server and hosted VMs. By using RDM drives, critical storage-oriented applications, such as snapshots and disk mirrors, can be implemented on a storage system such as the iQ2850.

What's more, there are two compatibility modes for RDM drives: virtual and physical. On creation of an RDM drive, the default mode is for physical drive compatibility. ESX does not virtualize the volume: ESX simply passes through all information about the device and all SCSI commands to the device. As a result, ESX cannot create a snapshot of an RDM drive in physical compatibility mode and that excludes physical mode RDM drives from VCB backup.

On the other hand, using virtual compatibility mode, ESX virtualizes much of the volume's physical characteristics. Among the physical features, ESX encodes the volume size in the RDM file. As a result, ESX is able to create software snapshots that include RDM devices that have been created in compatibility mode. That device virtualization, however, inhibits the VM's OS from recognizing physical changes, such as the automatic resizing of a VDisk by the iQ2850 based on the utilization of file space. To make the VM aware of the new volume size, an IT administrator will have to power down the VM, remove the RDM, and then add it back into the VM's configuration.

In testing the iQ2850, openBench Labs created a number of VDisks that would be used as RDM drives in a VM via iSCSI access using both ports on the QLogic iSCSI HBA installed in the ESX server. In all cases, ESX 3.5 recognized that we had multiple connections to the same VDisk on the iQ2850 and established MPIO failover in active/standby mode. We modified the default configuration to active/active for basic load balancing by automating active path connections in a round-robin scheme.

We then used iQstor's Managed Snapshot Services (MSS) to provide non-disruptive point-in-time images of these VDisks to ensure data availability by reducing recovery time and providing more recovery points. Rather than create a full copy or clone of a disk volume, MSS generates a copy-on-write differential snapshot that improves space efficiency by only storing changes to the volume after the snapshot. Nonetheless, those needing a full clone can easily create one using iQstor's Volume Copy Services

By creating differential snapshots using copy-on-write, iQstor is able to create snapshots that typically require about 10% to 20% of the space of a full copy. Using iQstor's MSS, administrators can easily verify functionality, reliability and accuracy of modifications to production data without impacting production processes. Since snapshot volumes are a special form of VDisk, IT administrators using iQstor's MSS have the ability to access data from any point-in-time without having to interrupt production activities. What's more, administrators can create writable volumes of real data for application testing.

Nonetheless, the most critical use of snapshots is to provide immediate recoverability when production data has been accidentally corrupted. As a result, snapshots play a critical role with respect to the recovery point objective (RPO) for critical applications in any disaster recovery plan. In addition, the efficiency of MSS compliments the way in which critical applications are deployed on VMs.

Few applications have constant-state processing profiles. Mission-critical applications tend to have distinct predictable periods of high-level usage. During those periods, many sites now use advanced VOE capabilities to migrate a VM running such an application to an appropriate EXS server for that specific high-processing time period. With MSS, during that period of extended processing, automated snapshots can be set to run as quickly as every 60 seconds, which should meet just about any RPO.

Jack Fegreus is CTO of openBench Labs.




WHAT WE TESTED: iQstor 2850 Storage System


(3) Dell PowerEdge servers
-- Windows Server 2003 
-- VMware ESX Server 3.5
-- VMware Consolidated Backup
-- Veritas NetBackup 6.5.3


-- The iQstor System Manager GUI presents administrators with a uniform logical representation of physical storage resources and services to simplify operations for storage provisioning and logical volume management.
-- The iQstor 2850 provides Multipath I/O (MPIO) support for clients capable of supporting active/active or active/passive controller failover, as well as load balancing for multiple LUNs and multiple hosts.
-- Provide a DR solution to remote or branch offices: iQstor's Managed Snapshot Services (MSS) provides quick and easy verification and re-use of critical data with non-disruptive snapshots that require minimal fractional space.

To maximize potential I/O performance differences in the iQ2850 storage hierarchy for our test VOE, openBench Labs created two storage pools: one optimized for high I/O throughput, and one optimized for maximum storage capacity.

To create a performance pool, we used ten drives in a RAID 1+0 configuration. When we chose a 1+0 RAID level, iQstor System Manager first created five arrays as RAID-1 mirrored pairs and then created a new RAID-0 array by striping data across the five mirrors to provide 4.6TB of storage. We also created a 3.7TB capacity pool by using five drives to configure a RAID-5 array and assigning that array to a new pool.

Once a storage pool is defined by a back-end RAID array, iQstor System Manager provides for automated provisioning of that pool. This is done by setting a minimum threshold for free space in the pool.

On the iQ2850, we created two 25GB VDisks, VM1_RDM1 and VM1_RDM2, and exposed them to both ports of the ESX server's iSCSI HBA. We mounted VM1_RDM1 in virtual compatibility mode to maximize the functionality of this drive with that of vmdk-based virtual disks within the VOE. As a result, we were able to include this drive in VCB-based backup. We mounted VM1_RDM2 in physical compatibility mode to maximize low-level physical compatibility with physical hosts on the SAN.

In both cases, the iQ2850 was able to monitor file structure and correctly measure VDisk usage by the VMs. As a result, we were able to utilize all of the advanced SAN services and automation provided by the iQstor System Manager software.

When creating snapshots with iQstor's System Manager, administrators are presented with a number of critical options. In addition to normal one-time snapshots, an automated series of snapshots can be established at intervals as small as one minute. In addition, a maximum number of snapshots can be set to limit the total amount of space utilized. When that maximum number is reached, the oldest snapshot is deleted when a new snapshot is created.

We set up normal and repeating snapshots on VDisks associated with RDM drives assigned to VMs running on the ESX server. We were then able to make the snapshot VDisks available to the Windows backup server for mounting in a similar manner to VCB. When we mounted the 26GB original volume and the 5.5GB snapshot, both volumes appeared to Windows Server 2003 on the backup server as two 26GB volumes representing the current and the snapshot state of the VM's iQperf1 drive. In terms of utilized space, the current version contained 3.5GB more data than the snapshot.

This article was originally published on June 02, 2009