Application-centric storage management

Managing data according to a priority hierarchy eases storage management hassles and decreases labor costs.


The speed at which data can be retrieved and provided to the appropriate point depends on how well your storage complex is configured and on the importance and priority of the application. An application-centric storage management approach enables a company to manage storage more efficiently.

End users have certain service-level expectations related to response speed and performance. In a credit card transaction, for example, you do not expect to wait in a checkout line for several minutes until the card-approval code is transmitted back to the merchant from the card-approval facility. The same is true at a gas pump or ATM. Contrast this with accessing your system at work and requesting a report of certain sales activities that occurred over the past quarter. You don't expect an instant response because you are probably conditioned to wait several minutes for such a report to complete.

These examples emphasize that applications lie in a hierarchy from business-critical, instant-response down to occasional access where response time is not the most important factor. Therefore, the data managed by these applications falls in a hierarchy, and the storage device where the data resides should be managed as related to that hierarchy of importance. This guarantees that storage containing the important data is not treated routinely with all other storage.

While it is a common practice for applications to share storage resources, little thought is given to the fact that all data on a common storage resource is treated the same by the system. With islands of storage embedded in or directly attached to a particular server-and that server dedicated to one application-this "all data is equal" paradigm may not be a particular problem.

However, organizations are discovering that while they have 7TB of storage capacity, for instance, they may have only 5TB of data. The problem is that the storage capacity they need is attached to some other server they can't get to, creating islands of wasted capacity and poor investment return. Using the "all data is equal" paradigm in this case causes you to expend resources managing storage that does not improve the performance of important applications.

Perhaps the most obvious solution is to put that storage where any application can access it. This trend is called storage consolidation, which will drive the need to manage data in the context of its relative importance to the business. All storage hardware vendors offer software that gives a broad range of capabilities to manage hardware in an isolated context. This means you use a vendor's software to manage a particular brand of storage hardware, and you manage the storage as though it were the important entity. However, these storage utilities do not recognize a data hierarchy and cannot understand an application association. The value of such solutions is rapidly diminishing as IT executives realize they need something more in their maze of different storage technologies from different vendors and the myriad of applications storing data.

Organizations require an automated, self- configuring, self-deploying solution that recognizes applications and their data boundaries; their relative importance to a business; how database management systems are constructed; and how applications use them. This software must then manage the storage environment with minimum human intervention.

Application-centric storage management

Application discovery, followed by the discovery of all assets belonging to that application, is the first step in application-centric storage management. From there, the software should perform a number of other functions (see checklist), including the following:

Logical-to-physical mapping: Application-centric storage management creates discrete inventories outside of a logical file-system view. An application consists of executables and data. The relationship between these assets can be complicated. For example, an application may have its executables resident on one server and have data in several other locations linked only by a file system. The software must be able to create discrete inventories outside of a logical file-system view and feed those to the processes used to manage the storage in question. The software must be able to provide a file-system- independent analysis of the problem, which requires logical-to-physical mapping.

File-system-based view: To further clarify this point, you must be able to see a file-system-based view of an application, as well as a combined view for a storage subsystem that shows all file systems. You should be able to see I/O rates as the file systems and storage subsystems see them.

Evaluate target application: You need to evaluate performance, capacity, trends (growth or shrinkage), efficiency, recoverability, and availability-all within the discrete boundary of the target application. It's also advantageous to detect applications that are less important but consume resources, thus affecting the target application. Starting with the target application, you have to know the files, storage, cache, backup, and network segments that form the discrete unit. Within each area (storage, for example), you need to know, at a minimum, exactly what capability the storage has (how intelligent it is), what service levels it can support, disk geometry, and architecture of cache management services (if applicable).

Graphic depiction of resources: You should be able to see a graphic depiction of applications and resources. This graphic should highlight contention areas that result from I/O collisions between applications having similar usage and access patterns, for instance. The visual should color-code assets belonging to various applications and pinpoint their location.

The highlighted area then becomes a drill-down point that takes you from the top down to the root cause. When the software determines the root cause of a problem, it must then provide a choice of solutions, which could be as simple as relocating a file or as complex as restructuring the way a database or its indexes are constructed.

Automated decision-making: The software must then offer the mechanism to implement these changes by human selection or automated decision-making. For example, the software might be authorized to relocate data files on its own to balance a system but not be authorized to make invasive changes like reorganizing a tablespace. The 20% of events that result in 80% of the problems can be successfully automated. The remainder would then represent a problem the software has not seen before. If the problem is new, then the software must learn to recognize it the next time it appears and take the appropriate steps to resolve that problem.

Understands application interaction: Interaction among applications is more problematic. This is the leading inhibitor to efficient use of very-high-density disk platters. As density increases, so does the likelihood of collision between I/O requests from different applications sharing a platter. Also, all real estate on the platter is not equal. Inner tracks have less data than outer tracks; therefore, the read/write head sees less data when it's over the inside tracks. Outside tracks have more data. Files needing higher performance would benefit from outside-track placement. Stale data, or data with low-access requirements, would benefit from inside-track placement.

With this approach, the entire platter can be used without performance penalties. A dense platter could be populated with high-activity files whose access requirements peak at different times during the day. If applications are shift-sensitive but still require high performance, then a dense platter behind a high-performance controller and cache could alternately service each application during its busy time. This type of data organization reduces the need for discrete, high-performance storage enclosures while optimizing the enclosures you already own. To understand this interaction, the product has to see and know about all the applications present.

Intelligent prediction: Providing services as described above also requires intelligent prediction, which understands the impact of making a change at the target as well as the residual impact at the source. In some cases, you want to improve the performance of a disk but do not want to move one large tablespace; you would rather move a combination of smaller files to achieve the same result. In that case, the residual effect would be the most important to you.

Capacity planning: Capacity planning may seem more obvious, but subtleties exist. Trending the growth of data over several days or weeks can be easy, but seasonality may be a problem. The storage management software should let you know whether the application has seasonality and what growth burst it experiences during these seasons. Tablespace pre-allocation presents a similar problem. The tablespace may consume 200GB of disk space, but data within that tablespace may consume only 60GB. You need to know about the internal growth of that data and have the appropriate table-expanding process take place at the proper time. If you have this as a highly reliable service, you will be less likely to over-allocate in the beginning, thus providing an avenue for more-efficient use of that space.

The reverse is also true. If you have a 200GB tablespace that is nearing its allocation boundary (in other words, full), do you reorganize that tablespace or expand it? Perhaps reorganization would yield 20% or 30% de-fragmentation, and expansion would not be required.

Follow problems throughout hierarchy: Application management is complicated by new implementations of technology that provide discrete networks dedicated to storage. This network architecture provides the fastest path to storage consolidation, but it also places additional assets in the data path that have to be monitored and managed. Once you have solved contention problems at one level, they move up to the next level. In this case, you may solve a problem on one volume and it moves to another volume, or perhaps it moves out of the storage enclosure and into a switch. Therefore, the solution must be able to follow a problem through the system hierarchy until it reaches the highest and fastest part of a system.

By implementing application-centric storage management, companies can optimize storage resources by understanding the relationship between the application and the storage resource layer. Such detailed storage resource information provides a breakthrough ability to optimize an application's performance, allowing heavily accessed information to receive sufficient network resources and optimal placement on disk. This approach gives IT managers a solution for management of all data, whether that information is as simple as the last book someone purchased or a recap of an investor's history of stock purchases.

Chris Gahagan is vice president and general manager, Recovery and Storage Business Unit, at BMC Software (www.bmc.com) in Houston, TX.

Application-centric storage management should...

Provide logical-to-physical mapping.
Provide a file-system-based view.
Evaluate the target application.
Provide a graphic depiction of application resources.
Provide automated decision-making and problem recognition.
Understand application interaction.
Offer intelligent prediction to assist in making purchasing decisions.
Ease capacity planning.
Follow problems throughout the hierarchy.

This article was originally published on April 01, 2001