Implementing tiered hardware is actually the last step to take.
By James E. Geis
Taking the right steps in the right order can make the difference between a tiered storage architecture that truly meets your business needs and a lot of unnecessary frustration. Although the precise order for some of these steps will vary from one organization to another, the following is a good working plan for most businesses. It adheres to the information management hierarchy of addressing policy decisions first, followed by management, operations, and technology.
1. Institute a policy program for the treatment of information.
Documented policies are your best safeguard against litigation, be it actual or threatened. They provide clear and concise guidelines as opposed to vague and ambiguous directives. Now that any and all digital information can be admissible in a court of law, it is paramount that all policies not only be documented, but also supported by all levels of management, communicated to the entire organization, monitored for real-time adherence, and constantly reviewed and updated with assigned owners for each process step. Policies must also be auditable for a retroactive review.
Legal ramifications aside, today’s good business practices require establishing, documenting, and communicating policy as to how each class of information should be treated during its useful life. Storage policy needs to address how information should be stored and on what medium; how it is protected through backup and/or replication for disaster-recovery preparedness; how long it must be retained; where, when, and by whom it should be accessed; how easy it needs to be to index, search, and retrieve; and finally, when it can be deleted.
2. Establish rules for compliance-related information.
See Step 1! Recent history has shown that the courts are more lenient on businesses that have a documented compliance policy for information and can prove that they communicated, enforced, monitored, and audited it and that they have a de facto standard for information management in keeping with the policy. The courts have charged gross negligence in situations where there was not clear direction or policy. Tracking and maintaining information for the long-term is not a challenge merely for the storage administrator, but for the entire organization and every information “owner.”
3. Determine long-term retention and migration strategies for each class of information.
Different compliance drivers influence this decision for each industry. The Health Insurance Portability and Accountability Act (HIPAA), for example, dictates that medical records (e.g., communications, images, documentation, etc.) be retained for two years past any patient’s death. For pediatric patients, that could be more than 90 years; so the industry is serious about the 100-year archive. Once you determine how long the information must be retained, decide how it will be stored and protected for the long-term and make technology decisions that provide scalable upgrade paths as well as data migration capabilities.
4. Define a precise process for knowing what information needs to be produced in litigation, when, and how quickly. (Be sure to involve your legal department in this process.)
Digital information (as well as a host of other types of information) is subject to litigation and can be demanded by subpoena in a legal proceeding. The information archiving market was born of this fact. Many high-profile corporations have received hefty fines simply due to the inability to produce “evidence” in a timely fashion.
Having a process for the storage, indexing, and search methods for all types of information greatly enhances the ability to respond. Many organizations use the manual search method, which sometimes includes restoring large amounts of data and searching with cumbersome command lines (awk, grep, etc.) or the native application “find.” However, it is important to match capabilities to requirements: How easy is it to find a specific piece of information and when might you need to produce it? The native application software sometimes enables the creation of a metadata repository with index and search capabilities to find any type of information. Organizations must determine whether the capital expenditure of an automated document, message, or digital information management system is greater or less than the cost of the human resource time spent performing those tasks to meet legal discovery requirements. And, how do the physical and logical architecture of the storage infrastructure enhance your ability to retrieve information on-demand and at the right time?
5. Understand what is stored where, when, and why.
This may seem obvious, but sometimes it is not. Now that there is more scrutiny over information because of the potential compliance impact, it is even more important to know what information is stored where, and why. What information is stored on desktops or laptops? When should it be stored in a central location accessible to the appropriate parties? For example, should electronic mailboxes be centralized so they can be indexed, searched, and easily retrieved if necessary?
6. Define the application requirements for performance and availability.
Different storage tiers provide not only different performance characteristics and recoverability options, but also various protection schemes and promptness in delivering information. Dynamic, highly transaction-oriented information requires that service levels be met by enterprise-class storage. Generally, the more frequently information is updated, the more stringent the recovery and protection requirements. Static, reference, or fixed-content information typically does not require that same performance, unless the application using the information is a data warehouse, graphical, media-based, or seismic (an application that requires reading large datasets quickly to process large amounts of data). Whether the application is primarily read, write, or read/write shapes storage requirements.
7. Determine the data-protection, restore, disaster-recovery, and business continuity requirements for each information class.
Typically, applications that are revenue-generating, customer-facing, compliance-related, or Web-enabled require a high level of data protection and the fastest recovery to meet tight recovery point objectives (RPO) and recovery time objectives (RTO). In addition, the interdependencies of the applications must be clearly outlined to ensure effective restoration of operations after disaster situations.
8. Delineate the transition from dynamic or transactional to static or reference information.
At some point, most information becomes stale-in some cases, the second after it is written to disk. It may not become valueless or lack importance, but it becomes static and will be read-only. The “shelf life” of information varies by application and data type. For example, once you click “send” on an electronic message, it becomes reference information; each reply will be appended as dynamic information, but the original message remains static. Files sent with the original message have the same fate. This life cycle holds true for most semi-structured or unstructured information; the original is set, and any changes or updates are tracked in subsequent versions.
Most reference information does not need the performance of high-end, enterprise-class storage. But many compliance requirements dictate that information be stored for the long term and kept easily accessible. This is one of the main reasons for a tiered storage infrastructure. Each storage platform has its relative pros and cons for integration, performance, availability, and extensibility. SANs, NAS, clustered storage, storage routers, and virtualization extend flexibility.
9. Examine storage utilization versus allocation with SRM tools.
Every application owner, end user, or business unit is going to want more and more storage. Storage resource management (SRM) tools allow you to determine the difference between what storage is allocated and what is utilized. Generally these tools will allow you to sift through all types of storage and file systems as well as databases. The tools will also allow you to weed through the physical and logical configurations and determine how storage is allocated and to what applications.
10. Review structured database distribution.
Databases are moving toward a more distributed, or “grid,” architecture. Some newer releases allow you to segment table spaces, as well as other database elements, so information can be stored and delivered from separate servers and different storage platforms. Database archiving is becoming more mainstream and the major database software vendors, as well as some niche players, are answering the demand by integrating this option. Databases provide a method to structure the data so it can be easily retrieved. By distributing the elements of a database that have different performance or relevance requirements, you can take advantage of these newer features. NAS is becoming a popular platform for databases; the more IP-centric storage infrastructures become, the more important it will be to distribute the appropriate content to the appropriate medium at the appropriate location.
11. Categorize semi-structured and unstructured information.
This may seem like an overwhelming task, but when you look at the available tools it isn’t so difficult.
Electronic message archiving, for example, allows you to offload old or infrequently accessed messages to another server with a different tier of storage and allows less-frequent backups for static information.
This also allows you to keep closer tabs on a commonly misused medium. Document management software provides the same features and functions. Most archiving software has built-in features that de-duplicate messages, attachments, and files, as well as hooking into ERP, SAP, or other structured databases to take advantage of moving older data to different tiers of storage.
12. Determine which information classes can be addressed most easily.
This task will be different for every organization as each has different information management challenges. Many software suites allow this to be easily performed. Digital information archiving, e-mail, document management systems, backup software, and instant messaging solutions are numerous and accessible. Start where your organization has the most risk. This is likely to be the medium used to track financial records or public and internal communications (e-mail).
13. Analyze and prepare for the impact to the network.
As storage protocols converge on the IP network, it’s important to understand the impact on network capacity. iSCSI and FCIP are becoming more mainstream and NAS is becoming enterprise-class. Some analysts believe iSCSI will overtake Fibre Channel SANs as the storage standard for enterprises. Backup, restore, recovery, replication, and archiving are merging and storage virtualization is becoming the hotbed of networked storage convergence.
14. Understand the simplicity and the complexity of APIs and storage integration.
Many newer storage platforms, such as content-addressed storage (CAS), were built to interact with the middleware applications that access storage. CAS is pretty simple; it is read and write. Other properties (e.g., error correction, de-duplication, replication, backup, disaster recovery) are currently the responsibility of the storage, not the application. Look for the application’s ability to use an API that best utilizes storage and increases flexibility for options when it comes to replication, disaster recovery, backup, restore, etc.
15. Designate how data will be moved between tiers.
This is probably the most difficult task, as the application using the information is generally not aware of the storage unless there is an API. Most likely, applications will need to take on the additional role of understanding the storage options rather than relying on the operating system, network, or storage frame. They will have to be able to move information around the storage infrastructure, maintain metadata to know what is stored where, and follow a policy-based approach to moving and tracking the information. In the meantime, matching information classes to designated tiers will have to suffice.
Taking a methodical approach, with the business units and IT management in sync, ensures that information management decisions are policy-, management-, and operations-driven rather than technology-driven.
The result will be an effective, cost-efficient, flexible tiered storage architecture capable of supporting your organization’s short- and long-term information management goals-even in the face of rapidly evolving information delivery requirements.
James E. Geis is director of storage solutions at Forsythe Solutions Group (www.forsythe.com). Geis developed Forsythe’s information management framework-the road map Forsythe uses for information and storage consulting engagements. He manages Forsythe’s professional services practice focused on information policy, information life-cycle management, tiered storage, operational backup and recovery, and data replication and archiving.