Compellent's DIR, SIR, and RIR

Virtualizing SAN storage at the disk-block level rather than at the RAID-volume-partition level enables Compellent to implement a snapshot mechanism based entirely on metadata pointers that provides a fast way—with minimal I/O processing overhead—to clone disk volumes, and a radical thin-replication approach improves disaster recovery.

By Jack Fegreus

January 11, 2008—Given all of the technology advances that are driving down the cost of storage the continued explosive level of growth in storage should not be a significant IT issue. Nonetheless, it is a major issue and the reason for that is the cost of storage management. That's especially true when provisioning, deploying, or recovering a server.

According to a survey of IT executives, system and storage virtualization share the spotlight as the key strategies for cutting operational costs. These IT executives find themselves faced with managing large server farms that grew out of the need to isolate mission-critical business applications to ensure the performance and scalability of those applications. While that strategy succeeded in delivering application performance and scalability, it left IT struggling to deal with a new issue: resource optimization. At too many sites, over-provisioning of resources led to resource-utilization rates that hover from 10% to 20%.

Many IT decision-makers now see system virtualization as a silver bullet for driving up resource utilization rates without negatively impacting the reliability, availability, and serviceability (RAS) provided by their existing server farms. In the "Symantec State of the Data Center 2007" report, most IT decision-makers chose server virtualization, followed by consolidation, as the best cost containment strategies to cut data-center costs. Moreover, by a 3:1 margin, sites implementing server virtualization were choosing to set up a VMware Virtual Infrastructure (VI) environment.

In setting up Replays for a new Windows Server 2003 OS boot volume for a VMware virtual machine, openBench Labs was able to independently assign any of the RAID levels and storage tiers for Data Progression on Replay metadata. As with the OS volume itself, we were able to ensure Replays were stored on the most cost-effective storage tier via automatic data migration based on rules that we had established for our test fabric.
Click here to enlarge image

For storage resources, separating logical functionality from the constraints of physical implementation starts with the adoption of a SAN. That starting point, however, often leads to a very complex environment as silos of technology burden, rather than relieve, resource management. To avoid that pitfall, Compellent markets its Storage Center system as a complete modular SAN solution that encompasses both Fibre Channel and iSCSI connectivity and not as a SAN component.

What at first glance appears to be a retro marketing strategy is actually driven by an advanced virtualization construct that results in a number of value propositions. The hallmark of Compellent’s Storage Center is a significant reduction in SAN TCO garnered through the automation of storage management tasks. To reach that level of automation, Storage Center restructures the way storage is virtualized. Traditional SAN software virtualizes storage based on partitions of physical RAID volumes, whereas Storage Center virtualizes storage based on disk blocks in a scheme dubbed Dynamic Block Architecture.

Replays can be created in a number of ways, including manually and automatically using a Replay template. For our Gold-VM-Image-W2K3 volume, a Replay was scheduled every morning at 4:00 AM. We also created a Replay before and after any changes to an OS volume to facilitate testing and verification.
Click here to enlarge image

Starting with the most basic functionality of a JBOD folder, Storage Center creates a virtual pagepool of disk blocks, generating a rich collection of metadata. Each logical disk block is associated with a collection of metadata tags that represent notions that are normally associated with file-level and volume-level storage constructs.

File-oriented metadata includes notions like data type and timestamps for events, such as data block creation, last access, and last modification. Volume-oriented metadata includes the type of disk drive, the associated disk tier, the underlying RAID level, and the corresponding logical volume. The result of this level of virtualization creates a powerful synergy within a VMware VI environment that is capable of leveraging storage virtualization within a SAN.

Starting with a Replay of our Windows Server 2003 OS boot volume, Gold-VM-Image-W2K3, we easily generated two new View Volumes: vm-fileserver01 and vm-webserver01, from which to boot a new file server and a new Web server. Since we were working within Storage Center, we could apply this process to any OS for use on either a physical or a virtual server.
Click here to enlarge image

Commercial operating systems, such as Windows and Linux, assume exclusive ownership of their storage volumes. As a result, neither Windows nor Linux incorporates a distributed file-locking mechanism (DLM) in its file system. Without a DLM, virtualization of volume ownership is the only means of preventing the corruption of disk volumes through inadvertent volume sharing.

On the other hand, the file system for VMware ESX server (VMFS) has a built-in mechanism to handle distributed file locking. What's more, VMFS avoids the massive overhead typically incurred by a DLM by treating each disk volume as a single file image. When an operating system in a VM mounts a disk, ESX opens a disk-image file; VMFS locks that image file; and the VM's OS gains the exclusive ownership to all of the files contained in the disk volume image.

When leveraging virtual servers in server consolidation projects, a SAN based on Dynamic Block Architecture can also generate considerably more cost-avoidance savings. What makes these savings possible are a number of advanced features that center on Compellent's Data Instant Replay, a Storage Center application that introduces the construct of a Replay. Like a typical snapshot, a Replay represents a data volume at a particular point-in-time; however, for a Replay, that point-in-time is a virtual point-in-time.

Using the Server Instant Replay wizard, which is invoked by the "Create Boot From SAN Copy" menu item, we were guided through all the steps of creating a new boot image from a replay our Gold-VM-Image-W2K3 volume. In particular, we were guided through the process of mapping our new volume, LUN26-NewDatastore, to the VMware server ESX1.
Click here to enlarge image

The three dominant snapshot technologies are copy-on-write, redirect-on-write, and split mirror. In each of these schemes, data must be written for the snapshot as soon as the snapshot is created: Whether or not the snapshot is used is irrelevant. In contrast, only a minimum amount of metadata is required when a Replay is created: Actual data is never written until the Replay is mapped to a server as a logical volume and put into use.

We imported the new View Volume LUN26-NewDatastore, into our VMware server, ESX1, as a Raw Device Mapping (RDM) volume. RDM volumes help simplify cloning, since VMware does not need to re-signature an RDM volume. A re-signature is required for a VMFS-formatted datastore. When we browsed the new volume, all of the original boot files were present and intact.
Click here to enlarge image

The most prevalent snapshot technology is copy-on-write, which is used by the Linux Logical Volume Manager. When a copy-on-write snapshot is created, metadata is written to the snapshot about the location of the original data. The snapshot then tracks writes that change data blocks belonging to the original volume. Before a write can change a block, a copy of the original block is copied to the snapshot. As a result, every write to the original volume will now require two writes: The first write preserves the original data by copying it to the snapshot, and the second write updates the original data.

Redirect-on-write, which is used by Network Appliance's filers, is similar to copy-on-write; however, this snapshot scheme does not incur the double-write overhead penalty. In the redirect-on-write scheme, new writes to the original volume are redirected to a new location set aside at the creation of the snapshot. Since the original data is not being overwritten, only one write is necessary.

For our DR scenario, openBench Labs used a VMware ESX server, ESX1, as the mission-critical server to support. The critical volume for testing was lun2-vm-storage1, which was a VMFS datastore volume that contained the boot disks of 11 VM systems. In this scenario, the remote VMware backup system was dubbed ESX2.
Click here to enlarge image

While the double write penalty is avoided, this scheme is complicated by its use of the original volume as a logical snapshot as the snapshot location now contains all of the original volume's updates. As a result, when a snapshot is deleted or automatically expired, there is a new overhead penalty as the data in the snapshot location must be reconciled back into the original volume. Moreover, that process grows in complexity as the number of snapshots increases and the working data set becomes more fragmented.

Employed by EMC's Symmetrix arrays, the split-mirror scheme creates a physical clone of a volume. The entire contents of the original volume are copied onto a synchronized mirror volume. Storage administrators can make clones instantaneously available by "splitting" a mirror. This snapshot method requires as much storage space as the original data and imposes the overhead of writing data synchronously to the original volume and its mirror copy.

Our critical VMware volume was replicated along with its Replays to a remote Storage Center as the volume Repl of LUN02-vm-storage1. Running Server Instant Replay, we easily recovered a new View Volume, lun02-vm-storage1, which we then mapped to the remote backup server ESX2.
Click here to enlarge image

In contrast, Compellent's Data Instant Replay leverages the logical block implementation and the volume-oriented metadata of Dynamic Block Architecture to provide the benefits of all three traditional snapshot techniques, while avoiding their limitations. In particular, a Replay preserves only pointers to blocks that have changed since a prior Replay. As a result, the amount of storage utilized and the required level of I/O processing are both minimal, and there is no limit on the number of point-in-time copies that can be handled. More importantly, an automated Replay can be scheduled via a Replay template as frequently as is necessary.

In particular, Replays first freeze the data blocks for a point in time as read-only and establish metadata pointers to that data, which is similar to a copy-on-write snapshot. By freezing the original data blocks as read only, however, there is no need to copy those blocks when data is altered. That task is also handled by pointers. As a result, Replays do not incur the copy-on-write overhead.

Through the use of pointers, Replays logically redirect data updates to the original volume data much like the redirect-on-write scheme. Nonetheless, by using logical pointers rather than creating physical block regions for the updates, there is no file fragmentation. What's more, Replays can be deleted or expired without having to resynchronize the data, as in a redirect-on-write snapshot.

Finally, since replays only contain pointers to data, converting a Replay into what Compellent dubs a "View Volume," which can be mounted and utilized by a SAN client system, is essentially instantaneous. That makes the process of creating and mounting a View Volume at least as fast as the process of breaking and mounting a split-mirror snapshot. More importantly, there is no need to pause I/O processing before invoking the creation of a View Volume, as it is when breaking a mirror.

More importantly, the process of creating a View Volume from a Replay can be further applied to a very important case for server consolidation projects for virtual and physical servers alike. By providing for centralized server booting via SAN-based disk volumes, IT can cut capital expenses by eliminating the need for any internal server disks. This opens the door to a number of potential hard and soft cost savings.

The savings start with the ability to implement low-cost diskless or blade servers, which do not require RAID-enabled host bus adapters (HBAs) or high-end power supplies. This helps to lower the power, cooling, and space needed for servers significantly. In addition, server maintenance contracts can be downgraded, since the OS, applications and data are now all independent of the physical server. Furthermore, labor-intensive storage and system administration tasks are simplified with all boot images physically separated from the servers and managed from a single centralized SAN console.

A disaster-recovery plan is all about how to resume processing within an acceptable amount of time—the recovery time objective (RTO)—with an acceptable amount of restored data—the recovery point objective (RPO). Costs escalate as the RTO window grows smaller and the target RPO grows larger. Using the Compellent Replay applications dramatically lowers RTO while meeting demanding RPO levels with much simpler technology. As a result, using Replays rather than snapshots will also significantly lower IT labor costs.
Click here to enlarge image

Server Instant Replay integrates with Storage Center and Data Instant Replay and helps automate the complex process of cloning an OS volume. The Server Instant Replay wizard carefully guides administrators through the process of creating a new boot volume from an existing OS boot volume, with particular attention paid to mapping that new volume in a way that allows a server to boot from the volume over the SAN. In a VMware VI environment, Server Instant Replay can be used to extend the capabilities of the basic VI Client or enhance the template functionality of Virtual Center.

Use of the Server Instant Replay wizard not only reduces the amount of time required to deploy and provision a server OS, but it also ensures the task is performed in the same way every time it is performed. Moreover, a server administrator can carry out the entire task of provisioning a server with a new boot volume quickly and accurately without any assistance from a storage administrator. Whether deploying one or multiple servers, the use of Server Instant Replay saves significant labor costs when compared to typical local setup or boot-from-SAN processes. What's more, streamlining server provisioning and recovery cuts both server and storage management time and increases the productivity of server and storage administrators.

Another important role for the Server Instant Replay application is in a disaster recovery (DR) scenario. Just as global operations, governmental regulations on records retention, and the new focus on e-discovery in civil litigation have changed the nature of backup, so too have these forces altered the fundamental constructs of recovery.

The old notion of data recovery from the previous day's backup is in no way sufficient to satisfy a number of regulatory requirements. For that reason, IT must now plan for the recovery of applications along two dimensions: the time that can elapse before the application is back online—dubbed the recovery time objective (RTO), and the amount of data that must be recovered—the recovery point objective (RPO). Moreover, IT recovery costs rise as the time window of the RTO shrinks and as the acceptable amount of data that must be recovered for the RPO grows larger.

In dealing with the dynamics of DR, the importance of Server Instant Replay again arises out of its integration with the core Storage Center functions and other Storage Center applications. In a DR scenario, the key application is Remote Instant Replay.

Thin Replication

In a process dubbed Thin Replication, Remote Instant Replay replicates Replays on remote systems. The process can be set up to use either synchronous or asynchronous communications. Independently of that synchronization choice, Thin Replication follows the tenets of Dynamic Block Architecture by sending only written data and excludes any allocated but unused space on a volume. Following the initial synchronization process, only changed data is transmitted.

As a result, the costs of DR contingency planning can be more easily controlled. Thin Replication creates a copy of the replicating system's actual data along with an unlimited number of Replays on the remote connection. There is no need to provision standby systems with configurations that are identical to critical production systems. What's more, Thin Replication helps optimize both dimensions of a DR strategy via low-overhead Replays that do double duty as RPO and RTO checkpoints.

The unlimited granularity of Replays enables more recovery points. Sites can then use that granularity to extend the use of asynchronous replication and still ensure meeting a high RPO. That allows them to avoid the I/O burden of a two-phase commit, which occurs with synchronous replication. Under the Remote Instant Replay scheme, a Replay created by Data Instant Replay on the replicating system is sent intact to the remote connection. These Replays now serve as re-synchronization checkpoints to reduce the amount of data needed to be transferred from the local system to the remote system in the event of a communication failure. In addition, the low impact of Thin Replication combined with asynchronous communications provides IT with the flexibility to use Replays as multiple recovery-point locations.

The Replay checkpoints copied to the remote connection system also serve as remote recovery points in the event the data must be recovered on the remote connection system in the event of a disaster. By combining Server Instant Replay with Remote Instant Replay, Compellent can dramatically lower the RTO objective without the cost and complexity of traditional snapshot schemes. In a traditional snapshot scenario, IT must replicate point-of-failure logs along with snapshots. Then in the event of a disaster, IT must run those logs against the recovered snapshots as part of the recovery process in order to meet a high RPO.

On the other hand, Server Instant Replay works as fast and efficiently on replicated Replays as it does on local Replays. As a result, Server Instant Replay can be used to recover a volume in a DR scenario in a matter of minutes from any previous Replay checkpoint. In addition, just as on a local server, there is no need to break a mirror or pause I/O processing to invoke the process of creating a View Volume and mapping it to a server. That means IT is now free to test its DR plans as frequently as is necessary to meet any unique business continuity constraints.

The bottom line for disaster recovery is that it's a matter of when, not if. Whether the problem comes about through a fast-spreading Internet Warhol worm, a natural disaster, or an external event, disasters will happen, and the impact on business continuity must be minimized in the most cost-effective manner.

More importantly, the group of tasks associated with recovering a server in a DR scenario is a microcosm for the group of tasks that occur regularly in a large-scale Virtual Infrastructure environment. In particular, running an application on a dedicated VM is ranked a best practice for IT for enhancing reliability and availability. Tactically, that strategy requires IT to utilize OS templates when provisioning a new VM. Via Compellent's Server Instant Replay wizard, administrators can automate that provisioning process to make it repeatable and cost-effective.

Jack Fegreus is CTO of openBench Labs. He can be reached at jack.fegreus@openbench.com.


OpenBench Labs scenario


Block-based storage virtualization


Compellent Storage Center with Data Instant Replay, Server Instant Replay, and Remote Instant Replay

  • SAN software certified for ESX Server 3.x
  • Simplify system administration through automation
  • Replays work with pointers rather than copies of data
  • Automate boot-from-SAN setup with Server Instant Replay
  • Provide an efficient DR solution with Remote Instant Replay


  • VMware ESX Server V3
  • Windows 2003 Server SP2


  • Replays utilize far less storage space than snapshots by preserving only pointers to blocks that have changed since a prior Replay.
  • A wizard cuts configuration time for OS volume to be booted over the SAN by guiding administrators through recovering or replicating a boot volume and then mapping the new volume to a server.
  • Remote Instant Replay copies only the replicating system’s actual data along with an unlimited number of replay checkpoints to minimize data loss and ensure a fast recovery time.
  • Thin Replication improves both RPO and RTO in disaster recovery.

This article was originally published on January 01, 2008