Strategies for tiered storage

Users can virtualize storage tiers to simplify management, reduce "SAN creep," and dramatically cut the costs of provisioning and migrating storage. However, there are some caveats.

By Dave Vellante

—Tiered storage is the allocation of different classes of data to various types of storage media with the goal of improving storage efficiency and reducing total cost of ownership (TCO). The assignment of storage categories is typically based on service-level requirements for applications related to availability, performance, retention requirements, frequency of use, and other factors. Tiered storage can be complex. Because of the large and growing volumes of data stored electronically, best practice applies policies and software to automate the ongoing allocation and matching of specific data and device characteristics.

Tiered storage can take many forms and often occurs as a natural outcropping of storage infrastructure growth. Tiers can be created within an array (using different capacity or performance disk drives), by allocating cache to different data, and/or by using physically separate storage arrays with different characteristics.

Tiered storage promises up to 50% savings on lifetime storage costs, making it an attractive alternative to provisioning capacity indiscriminately. Key business drivers and end-user considerations for tiered storage include the following:

  • "SAN creep" has created islands of incompatible storage, with no easy way to share data between servers and disk arrays;
  • Mergers and acquisitions have led to heterogeneous SAN infrastructures, further compounding complexity;
  • Expensive Tier-1 infrastructure has sometimes proven cost-prohibitive and has led many companies to develop "Tier-1 avoidance" strategies; and
  • Migration and provisioning complexities that require shutting down applications to move data or provision new capacity often exceed $50,000 for each array migrated.

End-user research conducted by Wikibon.org indicates that for every dollar spent on hardware and software an additional 50 cents is spent on array migration and provisioning over the life of an array. This figure can be reduced to less than 10 cents on each dollar spent by implementing tiered storage.

In the mainframe world, automated tiered storage has been a reality for decades, so why has it proven so elusive a target for the rest of the IT shop? The answer, of course, is in the heterogeneous nature of the storage outside the mainframe world. The diversity of storage hardware, applications, technologies, and architectures has created a challenge that so far has remained difficult to automate despite the efforts of several vendors. As a consequence, tiered storage as an active strategy only has about a 10% to 15% penetration in the marketplace, although depending on one's definition it could be argued that every company does some form of tiering.

One possible strategy is to migrate disk storage into a single architecture (e.g., go with all Tier 1), but this is an expensive proposition that has proven impractical for the vast majority of users. Data format is only one challenge (e.g., block versus file), and not even the toughest problem. The real issue blocking tiered storage adoption is creating an effective and "automatable" policy-based categorization system across the IT environment driven by the data access needs of each application and group of users.

To support this, some companies are simplifying policies by reducing the number of tiers that need to be managed, clearly communicating these guidelines, and turning to virtualization of both front-end server resources and back-end storage assets. One of the key benefits of virtualization is that the applications retain their view of storage, but that view can be mapped anywhere at any time—and dynamically. Data can be moved seamlessly without the application being aware of change. However, virtualization brings its own set of challenges, including implementation complexities and performance issues for many applications (discussed below).

Users not considering virtualization are trying to address tiered storage by focusing on large pools of homogeneous data such as e-mail systems and software development data and building islands of tiered storage around those pools that are growing rapidly. This can still result in significant savings and is much easier to implement than virtualization; however, the ongoing management costs can be steep. The trade-off is that these pools need to be bridged manually, with human effort, and because applications need to be aware of change, the applications have to be interrupted.

Virtualization begins to build these bridges in an automated fashion and appears to be the best solution for many large companies going forward. However, as an alternative, several vendors offer "in-the-box" tiering where higher-capacity, lower-cost devices can reside in the same array as more-expensive, higher-performance, lower-capacity drives. While this is the simplest form of tiering, users are sometimes reluctant to take this approach, especially if it requires adding capacity to more expensive Tier-1 storage platforms.

Importantly, while much of the discussion in tiered storage centers on expensive, high-performance Tier 1 and more cost-effective Tier-2 midrange solutions, more than half of the world's data resides on Tier-3 systems, either in very low-cost disk or tape technologies. This is a substantial issue for users and must be considered in tiered storage strategies as records management and retention policies increasingly are injected into the storage administrator's daily workflow.

What to do
Many companies interested in addressing the problems of SAN sprawl, out-of-control storage growth, and onerous migration costs are turning to tiered storage and initiating the following actions:

  • Clearly define recovery point objective (RPO) and recovery time objective (RTO) and let these be the drivers for data placement (versus a line-of-business head demanding Tier-1 service with no clear justification);
  • Communicate these requirements to the business and allow IT to allocate storage using these policy guidelines;
  • Simplify tiers, where the most demanding applications are placed on Tier 1 (based on service levels), default everything else to Tier 2, and migrate to Tier 3 based on records management and retention policies driven by legal and compliance concerns;
  • Virtualize front-end and back-end resources in parallel, providing a services layer outside of the storage arrays, and increasingly rely on less-expensive arrays to reduce hardware costs and exposure to expensive storage software licenses. Virtualize everything possible on Tier 2, and try to virtualize as much Tier-1 storage as possible;
  • Simplify storage management software and procedures, reducing existing storage management suites, if possible, to a single suite; and
  • Carefully test the reliability and performance implications of virtualization on an application group basis, and roll out deployments over a reasonable timeframe.
    This will come as close as possible to creating a single SAN environment on which to implement tiered storage strategies.

To be sure, these strategies are evolving and bring certain risks, namely, virtualization complexity, performance concerns, and availability issues (e.g., placing a Tier 1-class array behind a midrange virtualization appliance). And while these approaches appear to dramatically simplify the IT environment and promise lower hardware, software, and migration costs, they require a major commitment to the vendor(s) providing the virtualization technologies. Users are managing these risks using extreme caution, implementing conservative rollout plans and initiating metadata management strategies that are not just reliant on metadata produced within vendor product sets, but capturing the processes of the organization as a whole. Despite the challenges and potential for lock-in, the promise of 50% cost savings makes tiered storage an attractive goal.

Technology considerations/choices
Several key technologies make up an effective tiered-storage implementation: software and processes to ensure effective classification, software to non-disruptively transfer data sets, and hardware to ensure integrity.

The most difficult technical issue to ensure is integrity of data, especially when something goes wrong. Application performance is also a critical factor. Many organizations are turning to server and storage virtualization simultaneously to address their growth and cost challenges. The goal is to create a much more flexible, responsive, and cost-effective infrastructure.

For small networks, network-based appliances will work fine, but the integration has to be accomplished by the end user. Many companies are choosing to do just this, by virtualizing storage arrays behind an appliance. A key issue here remains the degree to which companies can successfully adopt this strategy across all storage domains. Specifically, for example, if the virtualization appliance has lower availability than the devices attached behind that appliance, what are the ramifications? Also, if an appliance cannot support the response time requirements of the application, because of overhead associated with virtualizing the arrays, it simply cannot be included in the virtualization strategy. Furthermore, often such appliances have limits to the number of LUNs supported, which will require installing multiple appliances—increasing complexity and expense. Notably, storage across appliances cannot be virtualized in a single logical pool.

On larger networks, for architectural reasons it may be advantageous to have the storage controller handle all the data integrity and data movement issues, where the controller has the ability to support virtualized arrays both internal and external to the box. This allows the attachment of installed assets and less-expensive arrays. As thin provisioning is adopted (the ability to logically over-allocate storage for an application but physically provision only what's required today), this becomes even more attractive. However, there are several integration issues with this approach as well, including the disruptive nature of establishing this architecture and migrating existing data and infrastructure into the new system.

To accommodate a tiered storage alternative to virtualization (or a complementary node in a virtualized environment) a growing number of vendors offer in-the-box tiered storage solutions using lower-cost SATA devices, for example, along with higher-performance Fibre Channel drives. This is a simple approach; however, it does not allow the integration of heterogeneous storage arrays. The development of clustered controllers with significantly enhanced scope may make these approaches even more attractive; however, such technologies are just beginning to emerge.

In terms of technology integration, users should not try to find nirvana (e.g., a single heterogeneous solution across data centers). Block- and file-based storage remain largely separate, further segmenting storage strategies. Rather, users should focus on data classification, policies, automation, and the integration of technologies that reduce migration and provisioning headaches.

David Vellante is a co-founder of The Wikibon Project, an open community of practitioners, consultants, and researchers dedicated to improving technology adoption. He can be reached at david.vellante@wikibon.org.

This article was originally published on November 16, 2007