By Jack Fegreus
—Neither Moore's Law—processing price/performance doubles every 18 months—nor Shugart's Law—magnetic storage price/bit halves every 18 months—shows any sign of being repealed in the foreseeable future. Meanwhile, IT budgets for capital and operational expenses are experiencing a modest 3% overall growth, leaving IT administrators to leverage Moore's and Shugart's Laws to gain operational cost savings.
Storage is often a key resource targeted for cost cutting. To successfully wring out cost savings, it's imperative that application workloads be separated from infrastructure resources through virtualization. That's the only way to minimize the risk of negatively impacting mission-critical applications, while optimizing the underlying infrastructure.
For storage, the process of separating a logical representation of a resource from its physical implementation starts with the adoption of a SAN, which is necessary to turn storage into a self-managing resource that scales without disruption. A SAN is far from sufficient, however, to guarantee success. Given that the Gartner IT consulting firm estimates that storage is 3x to 5x more costly per gigabyte to manage than to acquire, the easy way to garner the required cost savings is to reduce management costs by making significant changes in the way that storage is managed.
This is particularly important when dealing with structured and semi-structured data on which applications such as ERP and e-mail are built. These key production systems have inherent scalability and performance limitations that require both optimally configured storage and optimally organized data. Automating either optimization process has the potential to generate substantial management savings.
Getting to that level of automated storage management has required running a sophisticated and costly information lifecycle management (ILM) application on top of the SAN. To eliminate the need for separate ILM software and create a SAN environment with a high level of automated storage management, Compellent has radically restructured the way storage is virtualized in a SAN. While traditional SAN software virtualizes storage based on disk partitions, Compellent's Storage Center virtualizes storage based on disk blocks.
Unlike most intelligent SAN storage arrays, the Storage Center is sold as a complete modular SAN solution and not as a single component. The reason for this is rooted in the product's value proposition. The Storage Center significantly reduces SAN total cost of ownership by obliterating a surprising number of storage management issues and tasks. To achieve this goal, Compellent seized upon the notion of virtualizing the most fundamental element of storage: the data block.
All SAN virtualization software presents host systems with virtualized logical disks. With traditional SAN software, that process starts with a SAN administrator combining physical disks into RAID volumes, partitioning those volumes, and then virtualizing the partitions into logical disks. Driving all of the rich functionality of the Compellent software, however, is the sophisticated construct of a Dynamic Block Architecture. Using the Storage Center, virtualization does not begin at the level of a partition, but at the level of a logical block.
All of the disk blocks associated with all of the drives within a SAN are abstracted into a logical space of storage blocks, which can be larger than the physical space. Compellent accomplishes this through the aid of a rich collection of metadata. Each logical disk block is associated with a collection of tags that represent notions that are normally associated with file-level and volume-level data constructs.
File-oriented metadata for disk blocks includes such notions as data type, and time stamps for creation, last access, and modification.
Volume-oriented metadata for disk blocks includes constructs such as type and tier of disk drive, RAID level, corresponding logical volume, and frequency of access.
Another powerful aspect of Compellent's implementation of logical blocks and volume-oriented metadata is the virtualization of the notion of RAID level. Within a Storage Center environment, RAID level is just a mathematical abstraction that relates to data security and availability. No longer does RAID level relate to a physical disk-formatting task.
With a Compellent SAN, the system manager no longer needs to make any decisions about physically formatting RAID levels for disk volumes. That alone takes an important task in storage management off of the table. What's more, the virtualization of RAID levels directly resolves another important storage management issue: the need to balance I/O requests. The Storage Center automatically spreads and balances I/O requests across all disks in a physical tier independently of the RAID metadata classification. As a result, I/O performance scales optimally as disk drives are added.
SCREEN 1: When we ran a file I/O benchmark on a logical RAID volume, I/O requests were distributed by the SAN software evenly across all of the disks in the enclosure.
Optimization of real-time performance is important to support virtualization and server I/O on any SAN controller. The fine-grained Dynamic Block Architecture and extended functionality of the Storage Center, which includes the ability to transparently migrate data across tiers of disk drives, only amplifies the importance of performance. On the other hand, the complexity associated with the implementation of such levels of device and data abstraction has the potential to introduce a significant amount of overhead, which can turn response time to sludge.
To resolve this issue, Compellent chose to use an application-specific operating system (ASOS) in place of a traditional real-time operating system (RTOS) on its storage controllers. Like a normal operating system, an RTOS runs independently under an application. In contrast, the source code of an ASOS is explicitly linked with the source code on an application—in this case the Compellent Storage Center—to form a single executable image. This creates a precisely tuned environment for running the Storage Center software. What's more, it transforms the SAN into an appliance.
Currently, the principal hardware components that can be used in building a comprehensive Compellent SAN solution are
- Intel-based servers acting as Storage Center Controllers;
- QLogic QLA2342 Fibre Channel host bus adapters (HBAs) featuring automatic multi-path fail-over support;
- QLogic QLA4050C iSCSI HBAs primarily intended to extend SAN connectivity to remote locations;
- Brocade and McData Fibre Channel switches; and
- Disk drive enclosures populated with either Fibre Channel or Serial ATA (SATA) drives.
Through new software releases, this architecture can expand and scale with the addition of new hardware. Additions planned for near-term introduction include a new fast tier for Fibre Channel drives that connect via 4Gbps HBAs and a new middle tier for SATA-II drives that connect at 3Gbps.
Using well-defined collections of hardware, a Storage Center SAN is able to scale in performance, total capacity, and system availability with minimal impact on management overhead. Storage enclosures can be added in any increment to increase capacity, and I/O is automatically balanced over larger storage pools.
For high availability, QLogic Fibre Channel HBAs provide multi-path fail-over support between the controllers and disk enclosures. In addition, Storage Center controllers have on-board battery backup to protect data in the event of a power loss. For sites that need even more in high-availability support, Storage Center controllers can be clustered to provide for fail-over at the controller level.
Storage Center Core provides the foundation features for a Compellent SAN: Dynamic Block Architecture sets up block-level data management; advanced virtualization enables managing physical disks as a single pool; data caching supports multi-threaded read-ahead operations and mirrored writes; servers can boot from the SAN; and logical volumes can be copied, mirrored, and migrated without impacting users.
While all of the necessary components to set up a functional SAN are included in the Storage Center Core, the advanced features needed to change the storage management paradigm are licensed as optional applications. Of these options, three applications are critical to dramatically reduce operational overhead costs relating to system and storage management: Dynamic Capacity, Data Progression, and Data Instant Replay
Storage Center advanced virtualization assigns logical disk blocks to logical drives only when data bits are written to the drive. The Dynamic Capacity application builds upon block-level virtualization and expands the real capacity of a disk folder automatically as drives are added. This allows SAN administrators to define logical volumes that have larger capacities than the real physical capacity available in a folder. This is especially useful whenever a host operating system does not readily support the expansion of a disk volume's capacity.
By providing for the allocation of more storage than is physically installed—dubbed "thin provisioning"—and consuming physical disk resources only when data bits are written, Dynamic Capacity eliminates a number of the issues that complicate capacity planning. One of the most important of these issues involves a tradeoff between current capital expenses and future administrative expenses.
A number of applications and operating systems require significant management intervention when their underlying storage volumes must be modified. As a result, a decision must be made before implementation as to whether it is more cost effective to immediately acquire all of the disk capacity that will likely be needed in the future. Doing this will likely avoid all of the additional management tasks that will be needed to modify the storage architecture in the future. Complicating this calculated tradeoff is the fact that disks are a rapidly deflating commodity—Shugart's Law equates an 18-month delay with a net cost savings of 50%—while storage management is an inflating labor cost.
Dynamic Capacity obviates the need to make that tradeoff through its support of thin provisioning. The Compellent Storage Center works exclusively with used rather than allocated space. As a result, there is no penalty for allocating more disk blocks than needed to a logical drive within the environment. Dynamic Capacity complements that capability by automatically extending disk tiers, from which logical disk space is made available, as physical drives are added to a controller. In our testing, we allocated a logical volume to a server running Windows 2003 Server that was larger than all of our physical disk space and observed no impact on SAN performance.
The Compellent SAN's single most powerful feature for reducing storage management costs is Data Progression, which transforms the SAN into an ILM appliance. Data Progression provides the support needed for automated tiered storage.
With Data Progression, the SAN administrator defines policies about the frequency with which data is accessed. Then the storage controller tracks data access patterns on a real-time basis and transparently migrates logical data blocks between storage tiers according to those policies.
Data Progression automates a popular strategy to maintain a cost-effective storage infrastructure by optimizing the placement of data on devices based on the frequency with which that data is accessed and the performance characteristics—hence cost—of the storage device. In this scenario, only the most frequently accessed data files are retained on the highest-performing devices.
Unfortunately, that scheme becomes exceedingly complex for mission-critical applications that use either structured or semi-structured data. Data-retention regulations, such as Sarbanes-Oxley, require saving significantly greater amounts of historical data; however, database software is devoid of storage granularity, which compounds the overhead for both storage and database administrators. For a database, the smallest addressable storage component is a table. Without ILM functionality, record access activity is a meaningless statistic in terms of direct data location, as a database administrator (DBA) is limited in storage optimization to placing tables on logical drives.
The only way to optimize the location of historical records is to restructure the tables within the database. Typically that involves creating new instances of tables for storing historic data that are different from the production-instance tables. With that restructuring, tables can be placed on different logical disks with different underlying physical characteristics.
None of that manual intervention is necessary, however, on a Compellent SAN with Data Progression. Since disk blocks are as virtual as the logical drives that contain them, the Compellent controller can freely place infrequently accessed data blocks—and the records that those blocks represent—on the most cost-effect storage devices, without changing the way logical drives are presented to the host operating system (OS). More importantly, block migration is completely transparent to both the OS and applications. ILM software requires application-specific modules that embed stubs in an application's data files to redirect that application to any new data location.
Data Instant Replay
Compellent's analog to traditional snapshots is dubbed "Data Instant Replay." Read-only copies of data, called replays, provide for extremely fast recovery from business interruptions. Storage Center's architecture allows for the creation of an unlimited number of replays, which can be scheduled for automatic creation at specific intervals or created on demand. Without Data Progression licensed, all replay data resides within only one RAID level and disk tier.
With the extensive functionality and virtualization, which makes disk blocks logical rather than physical entities, I/O performance should hold top-of-mind attention for any IT decision-maker who is assessing a Compellent SAN. To get a handle on performance, we ran our streaming I/O benchmark, oblDisk, on two logical RAID-5 disks.
The first logical disk, oblTest1, was created from a pool of Fibre Channel drives spinning at 10,000 rpm. The second logical disk, oblTest2, was created from a pool of SATA drives. Next, we exported both logical drives to an Intel-based server that was running Red Hat Linux.
Our oblDisk I/O benchmark consistently provided throughput results consistent with previous tests of 10,000rpm Fibre Channel and SATA logical drives that were created using logical partitions of physical arrays with traditional SAN software. On reads, I/O consistently peaked around 72MBps using the volume created in Tier 1 with 10,000rpm Fibre Channel drives and 28MBps using the Tier 3 volume on SATA drives. Write throughput was 42MBps (Fibre Channel drives) and 14MBps (SATA drives).
In our throughput test, we found no significant overhead penalty on system performance using logical drives created from virtualized disk blocks rather than from a virtualized physical array partition, which allowed us to freely benefit from the ILM capabilities of the Compellent SAN. In particular, we were able to set policies for logical disk blocks that associated their location in storage tiers based on the frequency of access and the frequency at which the system would automatically re-align disk blocks within the storage tiers.
The savings in hard storage costs alone with automated tiered progression at the fine-grained level of storage blocks can be significant. While Fibre Channel drives can offer a 3:1 advantage in throughput performance, SATA can provide a better than an 8:1 advantage in cost. These savings are particularly compelling for mission-critical applications that are built on structured or semi-structured data files. For these applications, the savings are entirely transparent and involve no internal manipulation of file structures. In many cases, the cost of an administrator manually restructuring the appropriate database tables would likely exceed the savings in hardware costs.
openBench Labs Scenario
Fully automated SAN management
WHAT WE TESTED
Compellent Storage Center SAN
- 10,000rpm Fibre Channel drives
- SATA drives (FC connected)
- Dynamic Capacity option
- Data Progression option
- Instant Data Replay option
HOW WE TESTED
- Two Intel Xeon-based servers
- —Windows Server 2003
- —Red Hat Linux 8
- oblDisk v3.0
- Compellent virtualizes storage at the disk block level versus traditional SAN virtualization at the array partition level.
- Dynamic Block Architecture tags logical blocks with metadata that includes last access time and RAID characteristics to emulate.
- Logical blocks served from tiered storage pools based on drive characteristics, such as interface (SATA or Fibre Channel) and rotational speed (15,000rpm or 10,000rpm.
- Storage Center manages logical blocks only when they are utilized and not when they are allocated.
- Dynamic Capacity provides thin provisioning by automatically expanding disk pools to support allocated logical blocks.
- Data Progression provides automated tiered storage by migrating data blocks across pools based on access policies.
- Using our oblDisk benchmark, I/O throughput was comparable to tests run on traditional SANs.
Jack Fegreus is technology director at Strategic Communications and a regular contributor to InfoStor.