The key is to start by classifying data to determine the business value of information.
By James E. Geis
Today, many IT organizations are looking for strategies on how to tier or classify storage in order to realize the best value from their storage investment while minimally impacting storage management and operational practices. Storage management is a significant initiative because it requires integrating multiple (and possibly heterogeneous) storage tiers with backup, data replication, disaster recovery, security, and archival practices-all with information life-cycle management (ILM) policies in mind. Adding to the challenge, companies want to increase availability and performance, ensure data protection, and meet compliance requirements for all of their information classes.
When it comes to choosing technology, storage and systems administrators know that one-size-fits-all storage is not cost-effective for the obvious reason that high-end, high-performance storage is not practical for older or less frequently accessed information; the ROI and TCO just don’t add up. The goal is to design a storage hierarchy, or tiered storage infrastructure, that best meets your company’s information requirements.
Compliance regulations and the need for readily available archived, or reference, data have been the primary drivers for companies considering lower-cost storage alternatives. Most compliance laws include provisions for the availability, authenticity, accessibility, security, and recoverability of information-which all rest upon the underlying storage infrastructure, policies, and operational practices. This has prompted storage administrators to investigate solutions for keeping information for longer periods of time while maintaining some service or operating level agreements for access.
At most, three or four storage tiers are practical. The tiers can range from high-performance, high-availability enterprise-class storage for transactional applications and instantaneous response, to lower-performance, slower-access storage for occasionally or infrequently accessed information. Each tier must be evaluated according to several performance factors; the performance of the storage should then be matched to the business value of the information it contains as well as the information’s availability requirements. Ultimately, all tiers of storage must fit into an overall information management framework (see figure).
Building an information management framework
When designing a storage infrastructure, all decisions should be based on policies developed for the treatment of the information. Many factors influence those policies and the overall information management strategy, but typically, operational, compliance, and cost issues are at the top of the list. Companies must determine policies for how information is created, how it is accessed and by whom, how long it must be retained (no matter in which tier), and when it can be deleted. It is also important to consider how operational practices are enhanced or hindered by the storage tiers. It’s not enough to simply throw storage at a problem; administrators must determine how the tiers will benefit the overall information delivery, application, or service.
Managing the storage infrastructure becomes a greater challenge as the amount of information to be stored increases. While vendors are hammering out standards, end users still lack a single tool, or suite of tools, to manage all facets of information storage. Administrators are frustrated with using multiple tools and want “the one” that will allow them to manage the entire storage environment (including disk and tape) as well as integrate with ILM requirements.
Moving information between tiers is probably the most challenging piece of the information management puzzle, as the puzzle contains many interdependencies. Also complicating any solution is the fact that many frame-dependent snap/clone or data-replication technologies only work with homogeneous platforms. Making all these technologies and processes work together requires that all tiers be connected to each other, that the servers be able to access the storage through the same protocol, and that the application or operating system has the intelligence to segregate storage by class (i.e., performance, block versus file, etc.). This is where some intelligence on the metadata associated with the information (i.e., location, age, value, etc.) will need to be integrated into the primary application accessing the information to track its location.
When reviewing performance statistics for information as they relate to storage, it is important to understand the usage pattern of the information. Obviously, more frequently accessed or transaction-oriented information requires the highest level of service, while archived information or reference data (information that must be stored for record-keeping or compliance reasons), can be housed on lower-performing storage for the long-term. An organization must determine how its data is used and its business value in order to create policies for treating the information and to build the appropriate storage tiers. This is where ILM must be thought out, on a conceptual level, in order to build the storage environment effectively.
Additionally, an organization must determine the operational, compliance, and financial impact if the information is not available. What is the cost of downtime or decreased productivity if end users or customers cannot access information? Also, how is the metric for that data’s value determined? As part of the information classification process, it must be known what operational activities are necessary when information becomes unavailable and how they impact the administrative functions-i.e., who has to do what to resume business functions? What can be automated and what personnel are involved? Might regulatory penalties (fines or other actions) be imposed for information not being accessible?
Companies need ways to define how their information should be categorized and classified not only for storage, but also for security and business continuity purposes. The information classes below have been defined by the Data Management Forum of the Storage Networking Industry Association (SNIA) in relation to ILM:
- Business vital
- Business important
- Important productive
- Not important
To determine how information should be categorized, the following evaluations should be made:
Usage pattern of the information:
- On demand or request
Information availability requirement:
- Defined time frame
- Extended time frames
- Not defined or unnecessary
Financial impact of information unavailability:
- Significant and immediate
- Significant long- and/or short-term
- Potential long-term
Operational impact of information unavailability:
- Significant and immediate
- Significant over time
- Probable over time
- Possible over time
Compliance impact of information unavailability:
- Definite and significant
- Potential or possible
- None expected
While information classification is a necessary step toward matching storage tiers to specific business goals, security and other facets of the information need to be integrated into the classification model as well.
Evaluation of storage tiers
A parallel set of questions needs to be applied when you are designing a tiered storage infrastructure to help ensure that you make the right decisions to meet the availability, recoverability, and manageability needs for each type of data being stored.
Scalability: How easily can storage be added or removed without interruption? Is the storage designed with modular or replaceable components to easily increase, decrease, physically move, logically partition, and configure RAID alternatives?
Backup: How easily do all the tiers fit into your existing backup-and-restore infrastructure? Do all the tiers fit into your existing backup processes? How can the storage be backed up with minimal impact to the application or performance of the storage?
Recoverability: How convenient is it to recover information from any one tier to another? Can all the tiers be easily recovered? Can the different tiers interact with each other for disaster recovery or frame-to-frame or server-to-server replication?
Interoperability: How efficiently can the various tiers interact with each other and through what transmission protocol, operating system, or application?
Replication: How complicated is it to replicate information within tiers, in or out of a frame, to other tiers or platforms locally or remotely? What tools (existing or additional) are required and how do they interact with the application and information?
Performance: How well does each tier perform and what performance characteristics are dictated for each tier by the class of the information being stored? What are the performance requirements of the application or users? What are the service level agreements (SLAs) or operating level agreements (OLAs) associated not only with the storage, but also with the application or service that uses the storage? What characteristics are inherent to the storage device that protects data and allows you to optimize the storage performance with tools or internal data movement mechanisms?
Operations: What level of operational interaction is required to provision and manage the various storage tiers? Will you need additional tools or processes to operate the storage? What additional training, licensing, or skills will be necessary to operate the storage? What level of operational intervention is necessary to manage, replace parts, monitor, and track utilization and allocation? Basically, how easy is the storage to use?
Management: What tools are available to manage, monitor, report, or control the entire storage infrastructure? Do your tools fit into your existing storage resource management or enterprise management frameworks? Does the storage platform allow for in-band or out-of-band management? Can the storage be managed with agents or agent-less software? What other management protocols can be used? (End users are not thrilled with existing management tools, as reflected by the fact that some shops are still using white boards and spreadsheets to track their storage.)
Availability: What availability metrics are in place in relation to storage performance and SLAs? What metrics are necessary or defined for an information class that has been associated with the storage?
Data loss: What level of data loss is acceptable from transaction or media failure that correlates to availability? What self-healing or built-in mechanism supports data integrity and protection? How does the tolerable data loss integrate with the recovery point objective (RPO) and recovery time objective (RTO) assigned to the information?
Integrity: What is the fault tolerance and expected lifespan of the media? Tape and some disk types are more prone to failure and would not be the ideal media for production or transaction-oriented information. Some experts state that tape media, stored under ideal environmental conditions, can last up to 30 years, but that does not take into consideration how often the tape is “used,” meaning how often the tape is run across the heads, which degrades the tape’s integrity every time.
Policy management: What policies are necessary to determine how storage is allocated, provisioned, managed, and matched to information value, backup, recovery, replication, and archiving?
Building a tiered storage infrastructure is not an overwhelming exercise as long as you start with the end goal in mind: Keep important, transactional information on high-end storage and less-important, less frequently accessed information on lower-performing, cheaper storage for the long-term.
Match the appropriate cost of the tier to the business value of the information. It is possible to design each tier so that service levels, backup and recovery, business continuity, security, and networking can intersect and interact with minimal impact on operations.
Plan it out before you reactively make purchases. Also, determine the business value of information to be stored, understand end-user expectations, and make investments that will stand the test of time.
James E. Geis is director of storage solutions at the Forsythe Solutions Group Inc. (www.forsythe.com) in Skokie, IL.
Virtualization is the future of storage architectures
Storage virtualization is an important technology that will eventually change the way we use, manage, and tier storage. While this concept is still in its infancy, many vendors are offering multiple alternatives, ranging from fabric-based intelligence to multiple protocols within a single storage frame.
End users are realizing that a Fibre Channel SAN may not be the ideal storage platform since other protocols and technologies, such as iSCSI, IP SANs, and in some instances, NAS, are viable alternatives for high-performance applications; it all depends on the performance requirements and service levels. It is predicted that new Internet-based protocols will experience widespread adoption over the next several years.
Whether you aggregate storage inside a frame, behind a switch/director, or through a router, users will use the type of storage that best suits the application, is available at the best price point, and provides the best management alternatives. Once virtualization becomes more real and vendors provide stable solutions, users will be willing to put their production data on these platforms.