A radical approach to better storage in virtual environments

By Jack Fegreus, openBench Labs

As companies struggle to achieve maximum efficiency, the top-of-mind issue for all corporate decision makers is how to reduce the cost of IT operations. Universally, the leading solutions center on resource utilization, consolidation, and virtualization. Nonetheless, these strategies actually serve to exacerbate the impact of a plethora of IT storage costs, from failed disk drives, to insufficient performance, to excessive administrator overhead costs. As resources are consolidated and virtualized, the risk of catastrophic disaster increases as the number of physical devices underpinning an IT infrastructure dwindle.

Xiotech's ISE technology provides a solution that radically alters the reliability and performance of disk-based storage systems. By approaching disk drives not as a JBOD collection, but as a grid of storage surfaces, requires a radical rethinking of firmware for both disk drives and controllers. In so doing, Xiotech has improved the internal operating environment of its Emprise systems by enabling controllers to go beyond simply accessing data and begin actively managing component reliability.

More importantly, Xiotech's fundamental change in the underlying technology of storage systems transforms the notion of a basic storage building block for IT and OEM users: In the world of Disk 2.0, fundamental storage building blocks are application-centric rather than connection-centric devices. Sophisticated characteristics of DataPacs, not the simple electronic specifications of a bus, define storage building blocks in a way that makes provisioning storage synergistic with service level agreements (SLAs).

The Emprise 5000 storage system builds on ISE technology to eliminate the need for maintenance intervention by IT administrators and to provide near-linear scaling of application throughput metrics as the number of storage systems increases. Using Emprise 5000 systems, IT administrators are able to cost-effectively meet and support SLAs for multiple application-centric environments, including virtual operating environments (VOEs) for server and desktop infrastructure.

For OEMs and IT departments alike, building application-centric storage solutions starts with a storage foundation building block that fits a scale-out paradigm. To provide extended customer value, however, a storage solution must go beyond meeting specific application metrics and provide greater agility and efficiency by leveraging the price and performance of modular components to scale any required application metrics through the addition of more units. By increasing cache and I/O processing power as storage units are added, the Emprise 5000 system fits into the new paradigm for a storage foundation building block.

A VOE, such as that created by VMware Virtual Infrastructure, provides IT with the ability to rapidly provision resources, commission and decommission applications, and non-disruptively migrate applications and data among multiple virtual servers to handle changing service-level requirements. What's more, IT's success in deploying servers as virtual machines (VMs) has now turned attention on Virtual Desktop Infrastructure (VDI) as an alternative strategy for providing desktop systems to information workers.

Heart of ISE
Many IT decision makers continue attempts to resolve data growth using SAN-based storage arrays provisioned with low-cost commodity disk drives. While that strategy makes it easier for IT to add and replace disk drives, it also perpetuates reliability problems and administrator-related operating expenses. To eliminate rather than mitigate storage problems, Xiotech rejected the idea of building low-cost storage devices using JBODs controlled via SCSI commands and introduced a radically different construct: ISE technology.

The heart of ISE—pronounced, "ice"— technology is a multi-drive sealed DataPac with specially matched Seagate Fibre Channel drives. The standard drive firmware used for off-the-shelf commercial disks has been replaced with firmware that provides detailed information about internal disk structures. ISE leverages this detailed disk structure information to access data more precisely and boost I/O performance on the order of 25%. From a bottom line perspective, however, the most powerful technological impact of ISE comes in the form of autonomic self-healing storage that reduces service requirements.

In a traditional storage subsystem, the drives, drive enclosures and the system controllers are all manufactured independently. That scheme leaves controller and drive firmware to handle all of the compatibility issues that must be addressed to ensure device interoperation. Not only does this create significant processing overhead, it reduces the useful knowledge about the components to a lowest common denominator: the standard SCSI control set.

Relieved of the burden of device compatibility issues, ISE tightly integrates the firmware on its Managed Reliability Controllers (MRCs) with the special firmware used exclusively by all of the drives in a DataPac. Over an internal point-to-point switched network, and not a traditional arbitrated loop, MRCs are able to leverage advanced drive telemetry and exploit detailed knowledge about the internal structure of all DataPac components. What's more, ISE architecture moves I/O processing and cache circuitry into the MRC.
A highlight of the integration between MRCs and DataPacs is the striping of data at the level of an individual drive head. Through such precise access to data, ISE technology significantly reduces data exposure on a drive. Only the surfaces of affected heads with allocated space, not an entire drive, will ever need to be rebuilt. What's more, precise knowledge about underlying components allows an ISE to reduce the rate at which DataPac components fail, repair many component failures in-situ, and minimize the impact of failures that cannot be repaired. The remedial reconditioning that MRCs are able to implement extends to such capabilities as remanufacturing disks through head sparing and depopulation, reformatting low-level track data, and even rewriting servo and data tracks.

Using the Emprise 5000 Web interface, storage provisioning reduces to a simple four-step process. Administrators name the logical drive, choose a level of write caching, pick a storage pool and RAID level, and then map the logical drive to a host.

ISE technology transforms the notion of "RAID level" into a characteristic of a logical volume that IT administrators assign at the time that the logical volume is created. This eliminates the need for IT administrators to create storage pools for one or more levels of RAID redundancy in order to allocate logical drives. Also gone is the first stumbling block to better resource utilization: There is no need for IT administrators to pre-allocate disk drives for fixed RAID-level storage pools. Within Xiotech's ISE architecture, DataPacs function as flexible RAID storage pools, from which logical drives are provisioned and assigned a RAID level for data redundancy on an ad hoc basis.

What's more, the ISE separates the function of the two internal MRCs from that of the two external Fibre Channel ports. The two FC ports balance FC frame traffic to optimize flow of I/O packets on the SAN fabric. Then the MRCs balance I/O requests to maximize I/O throughput for the DataPacs.

In effect, Xiotech's ISE technology treats a sealed DataPac as a virtual super disk and makes a DataPac the base configurable unit, which slashes operating costs by taking the execution of low-level device-management tasks out of the hands of administrators. This heal-in-place technology also allows ISE-based systems, such as the Emprise 5000, to reach reliability levels that are impossible for standard storage arrays. Most importantly for IT and OEM users of the Emprise 5000 storage, Xiotech is able to provide a five-year warranty that eliminates storage service renewal costs for a five-year lifespan.

With a DataPac representing the Emprise 5000 system's base configurable storage unit and functioning as a storage pool, Xiotech offers multiple DataPac configurations for different I/O usage scenarios. Balanced DataPacs, which openBench Labs used in all tests, are tuned for sequential file access. Using Balanced DataPacs, IT administrators can configure logical disks that stripe data in either a RAID-5 or RAID-1 pattern for redundancy. In addition to Balanced DataPacs, an ISE that will be deployed to support high-end database-driven applications can be provisioned with High-Performance and Performance DataPacs, which are tuned for fast random access. When storage capacity is the dominant concern, Capacity DataPacs feature 8TB of usable storage.

The edge-driven VOE SAN
In the drive to optimize resource utilization and minimize the cost of IT operations, many sites now run eight or more server virtual machines (VMs) on each host server. More importantly, dense VM configurations put significant stress on the I/O throughput of a VOE host server, which represents all of the hardware resources for multiple VMs.

Elevated I/O stress extends to VOE support servers, such as those used to run backup processes. These servers are also required to deliver higher I/O throughput loads via fewer physical connections. This creates a new edge-driven SAN fabric topology, which is evolving out of server consolidation and virtualization initiatives. In turn, edge-driven SAN topology is also impacting application-centric requirements for centralized storage resources.

Traditional core-driven SAN fabrics are characterized by a small number of storage devices with connections that fan out over a large number of physical servers, of which each requires a relatively modest I/O stream and the likelihood of multiple servers accessing data simultaneously is not great. From the perspective of business applications running on VMs, application-centric I/O requirements are the same for a VOE. What changes dramatically, however, are the application-centric I/O requirements for IT administrative functions. In particular, during backup a VOE server will need to handle multiple hosted VMs executing in parallel.

Running the Emprise 5000 Web-based management console, we were able to map and grant access rights to logical disks to both the ESX server and the VCB proxy server. In particular, we created 25GB target volumes in storage pool 2 of the Emprise 5000 and mapped these devices to both the ESX server and the Dell server running VCB. The ESX server used these volumes to provision each VM with a fully virtualized RDM volume. As a result, the ESX server was able to create a snapshot of the volume and VCB was able to include the volume in a backup.

To provide a VOE test environment running eight VMs, openBench Labs utilized two servers, a 4Gbps FC SAN fabric, and a Xiotech Emprise 5000 system with two Balanced DataPacs, each with a raw storage capacity of 4,352GB. To handle the backup process, we installed Veritas NetBackup with VMware Consolidated Backup (VCB) on a quad-core Dell PowerEdge 1900 server running Windows Server 2008 R2.

To maximize I/O traffic and optimally leverage the two 4Gbps MRCs in the Emprise 5000 system, we also set up the Xiotech MPIO driver on our backup server. We hosted the eight VMs running Windows Server 2003 on a second Dell PowerEdge 1900 running VMware ESX Server and managed that VOE with VMware Center Server.

We provisioned all of the storage for our backup server from the first DataPac in our Xiotech Emprise 5000 system. To meet the storage needs on our VOE host server—datastores and Raw Device Mapping (RDM) volumes—we provisioned logical volumes from the second DataPac.

In particular, openBench Labs created a 1TB logical disk and exported it to the ESX Server for use as a datastore. From that datastore, openBench Labs provisioned each VM with a logical system disk. We then provided each VM with dedicated storage for working data by provisioning the ESX host server with 25GB logical disks, which were assigned to VMs as RDM volumes.

To enable the ESX Server to handle an RDM volume for administrative tasks, a VMFS mapping file in the form of a vmdk file acts as a proxy to redirect all access to the raw device. The RDM file effectively acts as a symbolic link from VMFS to the raw LUN and allows IT to balance manageability via VMFS with raw device access via the VM's OS. All of our RDM volumes were created in virtual compatibility mode, which virtualizes much of the volume's physical characteristics in a vmdk file and allows ESX to create a snapshot and VCB to include the drive in a backup.

VCB installs a virtual LUN driver on the Windows server—dubbed the VCB proxy—to enable that server to mount and copy snapshot files from a VMFS-formatted datastore resident on a shared logical disk. Once the Windows server copies the disk snapshots into a local directory, the backup application then backs up that local directory. All backup I/O takes place on the VCB proxy server and all data is transferred over the SAN fabric server. This is why the VCB proxy server's SAN connection is so important.

From the perspective of IT operations, VCB scenarios can be very time-consuming processes when measured in wall-clock time. At issue is the prodigious I/O overhead inherent in a VCB operation, which requires all of the data that is being backed up to be written twice: once to the local directory on the VCB proxy and then again to the backup media. To optimize the efficiency of an end-to-end VCB process, storage devices with high I/O throughput and VCB proxy servers that can capitalize on that throughput are essential. Equally essential is the ability to reach high I/O throughput levels without the need for significant manual tuning and intervention by system and storage administrators.

With operating costs continuing to dwarf capital costs for storage, any high I/O throughput solution that requires significant manual configuration or tuning efforts by administrators will not be cost effective. Within that context, openBench Labs examined the Xiotech Emprise 5000 system in terms of both its throughput, which reached upwards of 650MBps performing reads or writes and 1GBps on combined reads and writes, as well as the ability to capitalize on that throughput with minimal intervention by IT.

Scale-out test benchmarks
In our VOE test scenario, we employed two Balanced DataPacs. We dedicated one DataPac to provide primary storage for the ESX host server, which would be shared with the Windows VCB proxy server for access to VM snapshots. We dedicated the other DataPac to provide primary storage for the Windows server.

Using the Emprise management console, we monitored an Iometer benchmark that streamed large-block (256KB) reads and writes to two volumes. With the benchmark running on Windows Server 2008 R2, the Xiotech MPIO driver balanced SAN fabric traffic by routing 50% of the read and write I/Os to each FC port on the Emprise 5000. Internally, the Emprise 5000 then directed all read requests to MRC1 and all write requests to MRC2 in order to maintain full-duplex throughput at just over 1GBps.

To set bounds on likely I/O throughput that we could expect in our VOE, we ran sequential I/O tests with Iometer on our backup server. The server was provisioned with a dual-port 4Gbps QLogic HBA. With dual 4GBps ports on both the server HBA and the Emprise array, and Xiotech MPIO drivers that could leverage those ports, we had the theoretical potential to push both reads and writes at 800MBps.

We began our benchmark tests with logical drives created from both our RAID-1 and RAID-5 pools. In addition, we examined volumes with both write-back and write-through policies. Only when performing write operations with a write-through cache policy did we encounter any significant difference in performance between RAID-1 and RAID-5 volumes.

With most sites implementing write-back caching and employing UPS devices to ensure stable power, openBench Labs ran all subsequent VOE tests using a write-back caching policy on RAID-5 volumes. Using RAID-5 for data redundancy on logical volumes requires 25% extra capacity as opposed to 100% extra capacity for fully mirrored RAID-1 volumes. Given the reliability and autonomic self-healing of ISE technology, only the most sensitive mission-critical corporate data will warrant RAID-1 redundancy on an Emprise 5000.

1GBps bi-directional streaming

Testing sequential reads and writes to one logical disk by a single process, we pegged typical application I/O—8KB blocks—at about 265MBps. At the larger block sizes, throughput on reads scaled to just over 500MBps and just over 550MBps on writes. With multiple I/O processes and logical volumes, we scaled I/O throughput to upwards of 85% of theoretical limits.

In our sequential I/O tests, throughput for large block reads -- which are used by backup, data mining and online analytical processing (OLAP) applications -- exceeded 500MBps. With multiple worker processes, we were able to scale read I/O throughput to 675MBps. That level of performance was 85% of the theoretical wire speed limit for two 4Gbps Fibre Channel HBAs. To help support that level of throughput, the Xiotech and QLogic drivers on our server pushed mirror-like data traffic through the two SAN ports. In fact, when we monitored an Iometer benchmark that streamed large-block (256KB) reads and writes to two volumes, the Emprise 5000 directed all read requests to MRC1, all write requests to MRC2, and maintained full-duplex throughput at just over 1GBps.

With respect to the backup application scenario in our VOE, NetBackup reads backup images using 128KB I/Os during a restore and writes backup images using 64KB blocks. As a result, the ability of the Emprise 5000 system to read and write data in large blocks at near-wire speed on our SAN fabric provided our application-centric backup scenarios with a robust performance envelope.

In our test scenarios, we imposed three configuration rules. First, all logical volumes were configured with RAID-5 redundancy to maximize storage utilization. Second, all logical drives used exclusively by the backup server were provisioned from storage pool 1 (DataPac 1). Third, all logical drives used by VMs on the ESX server were provisioned from storage pool 2 (DataPac 2).

In our first test, we performed a D2D backup of the eight VMs in parallel on the host ESX server to a disk storage pool on our NetBackup server. In the first phase of this test, the backup server mounted the ESX server's disks (DataPac 2) and copied snapshot data from the eight VMs in parallel to a local directory (DataPac 1) on the backup server. In the second phase, NetBackup read the eight local VM snapshots (DataPac 1) and wrote eight backup images to a NetBackup storage pool (DataPac 1) in parallel.

During both phases of our backup test, just as in our Iometer test, all I/O packets were evenly split across both HBA ports on the server and Emprise 5000. Furthermore, the Emprise 5000 internally divided read and write processing on separate MRCs. As a result, our VOE backup ran at 1GBps in the first phase and 500MB per second in the second phase of our test.

In the second scenario, we backed up multiple logical disks on our server to a Sepaton VTL configured with eight LTO-4 tape drives. This scenario was a pure read test for DataPac 1. With eight virtual LTO-4 tape drives available, NetBackup automatically split the backup process into eight sub-process streams. Matching the results of our Iometer tests with multiple processes, the Emprise 5000 system delivered 650MB of data per second to the Sepaton VTL.

Exchanging IOPS
While numerous server applications, such as data warehousing, business analytics, and online analytical processing employ large data blocks to stream I/O, desktop applications use small I/O blocks—8KB is the default I/O size for most Windows-based applications—for synchronous reads and writes of small files. As a result, peak streaming throughput for supporting a typical knowledge-worker only needs to be about 5MBps, with 90% of the I/O being reads. Even for a power user, 7MBps is a satisfactory level of throughput.

For desktop virtualization, the issue is aggregating I/O support for multiple virtual desktops. In such a scenario, a stream of read requests from multiple desktop users behaves like a stream of asynchronous reads. Using Iometer with a single volume from the Emprise 5000, openBench Labs benchmarked asynchronous 8KB read and write throughput at around 250MBps. That level of performance pegs the Emprise 5000 system capable of supporting the I/O requests from 50 virtual desktop users simultaneously. To get 50 simultaneous users requesting data, however, requires a much larger user population. Queuing theory projects that a population of 20 virtual desktop users is required to generate a continuous stream of read requests. In other words, 20 desktop VMs can be supported by a continuous 5MBps stream of data. As a result, our single Emprise 5000 DataPac volume should support 1,000 VM desktops.

In addition to streaming throughput, there is also a need to satisfy small random I/O requests. On the server side, applications built on Oracle or SQL Server must be able to handle large numbers of I/O operations that transfer small amounts of data using small block sizes from a multitude of dispersed locations on a disk. Commercial applications that rely on transaction processing (TP) include such staples as SAP and Microsoft Exchange, which uses a JET b-tree database structure as a mailbox repository for 4KB email messages.

More importantly, TP applications seldom exhibit steady-state characteristics. Typical TP loads average about 1,000 IOPS and experience heavy processing spikes that rise to around 10,000 IOPS. That variability makes systems running TP applications among the most difficult for IT to consolidate and among the most ideal to target for virtualization.

To test random access throughput for database-driven applications, openBench Labs used Iometer on two 25GB test volumes. I/O streams involved 80% reads and 20% writes using 4KB and 8KB requests. With IOPS rates doubling for both 4KB and 8KB I/O in lock step, the performance profile of our Emprise 5000 was more in line with that of a RAM disk array.

While our Balanced DataPacs are optimized for sequential operations, the premium that transaction-processing applications put on the minimization of I/O latency makes a Balanced DataPac a good choice for mid-range TP applications. The ISE technology that enables data striping at the level of drive heads also provides more precise and quicker access to data to accelerate TP operations. To test the viability of deploying TP on an Emprise 5000 system with Balanced DataPacs, we ran random access tests with Iometer using two logical drives: one from each of our DataPacs.

In our random access tests, we generated I/O streams that were populated with a mix of read and write requests in an 80-to-20 ratio. This was done using 4KB transactions, which are used by Microsoft Exchange, and 8KB transactions, which characterize I/O access with Oracle and SQL Server. With a queue depth of 100 outstanding operations and one logical RAID-5 volume, we measured around 5,800 IOPS using 4KB I/O requests. Average access time for I/O requests during this test was less than 20ms. Using 8KB I/O requests, we averaged a sustained IOPS load of just over 5,500.

Adding a second volume from the other DataPac, we virtually doubled the IOPS performance with the same I/O response time. Most importantly, IOPS performance for 4KB and 8KB scaled in near lockstep. IOPS performance for 4KB requests is typically about 15 percent greater than 8KB data requests on storage based on mechanical disks. IOPS performance has only been identical for 4KB and 8KB requests when using solid-state RAM disks.

With small I/O transfers, the difference in IOPs performance is directly attributable to differences in data access time. Greatly improved data access, however, is a major component of the ISE value proposition. That value proposition was clearly validated in our benchmark, as the differential between 4KB and 8KB requests was less than 4% using the ISE-based Emprise 5000.

What's more, the magnitude of the Emprise 5000's IOPS performance, just as the magnitude of the sequential throughput performance, points to significant scalability for server applications. For use with Microsoft Exchange Server, our 4KB IOPs performance level of 11,500 IOPs should support upwards of 8,000 mail boxes.

Similar scaling is equally probable for a VDI initiative. Just as peak streaming throughput for a desktop user is pegged at 5MBps, peak IOPS for a typical desktop user occurs when loading an application, which requires a level of 200 IOPS. Within the context of a VM desktop, our single DataPac volume, which sustained 5,500 IOPS, should be capable of sustaining 50 simultaneous users loading an application. Once again, applying a queuing theory multiple of 20 to 1, in terms of IOPS load, our single DataPac volume should handle 1,000 desktop VMs. So whether we assess the Emprise 5000 from a streaming data or IOPS performance perspective, the storage system should meet the I/O requirements of 1,000 VM desktops.

Jack Fegreus is the CTO of openBench Labs.


UNDER EXAMINATION: Scale-out architecture storage server


Xiotech Emprise 5000
--  (2) Balanced ISE DataPacs
--  (2) 4Gbps FC ports
--  (2) Managed Resource Controllers
--  Active-Active MPIO
--  Web Management GUI


(2) Dell PowerEdge Servers
--  (2) QLogic QLE2462 4Gbps HBAs
--  Windows Server 2008 R2
--  NetBackup 6.5.4
--  VMware VCB
--  VMware ESX Server 3.5
--  (8) VMs running Windows Server 2003 R2
Sepaton S2100-ES2 VTL


--  Advanced firmware stripes data at the disk head level, which eliminates the need for IT administrators to create disk-centric storage pools, while MPIO front and back ends balance I/O across active-active controllers.

--  Iometer Streaming I/O Benchmark: Total I/O throughput simultaneously streaming 256KB reads and 256KB writes to two distinct volumes averaged 1GBps.

--  Iometer I/O Operations Benchmark: Random 4KB reads and writes (80/20 mix), which typifies Microsoft Exchange I/O, scaled from 6,500 IOPS using one volume to 12,100 IOPS using two volumes.

--  Storage capacity and performance can be easily increased by adding multiple Emprise 5000 systems. This adds local cache and processing power via active-active Managed Reliability Controllers, which locally manage I/O processes. 

This article was originally published on April 20, 2010