The real state of SRM, part 1

In the first part of a three-part series on storage resource management, we look at what SRM tools are good for-and where improvements are required.

By John Echaniz and Justin Schnauder

Storage administrators have come to know the benefits and shortcomings of storage resource management (SRM) tools all too well. From array-based software supplied by OEMs to third-party tools, SRM’s limitations leave many storage administrators struggling to control their environments with a hodgepodge of mismatched vendor applications and homegrown spreadsheets. In this series of articles, we will look at what SRM tools do well (and what they don’t do well) and present a gap analysis that will serve as a road map for users in the market for SRM products.

What SRM does well

Storage vendors have approached SRM tools in different ways. Some vendors focus primarily on reporting, whereas others approach SRM as a complete management framework covering reporting, monitoring, provisioning, configuration management, workflow, and automation. SRM tools are now commonly understood as centralized storage management applications, through which entire environments can be maintained.

In the past, vendors focused on one technology or another (e.g., SAN, NAS, their own storage arrays, etc.), and it wasn’t until relatively recently that they started to introduce cross-technology functionality, although it’s still limited primarily to the most basic functional elements rather than in-depth configuration and management.

At this point in their development, leading SRM tools share the ability to provide valuable statistics on used and free capacity in hosts, storage arrays, and NAS devices. The depth of this data varies by product, but overall the ability to gather and display this information is fairly standard. Typical data points include, but are not limited to, storage array capacity/utilization, NAS device information, host configuration details; and fabric and zone details.

An SRM tool’s ability to gather such data points provides storage managers with a baseline understanding of their storage infrastructures. One of the primary benefits of building such SRM infrastructures is the presentation of the storage environment through a “single pane of glass;” once it discovers the components within a storage network, SRM software can facilitate easier decision-making by gathering relevant configuration data in one view. Whether troubleshooting a problem, investigating firmware or patch-level compliance, tracking storage utilization, or planning for future capacity needs, SRM tools assemble a wealth of information that can help save a storage administrator’s skin in times of trouble-but only if the SRM rollout is well-maintained, of course.

With this basic data, SRM tools are generally useful for basic storage provisioning and zoning tasks from a centralized console. Although the majority of SRM users still tend to prefer using the switch vendors’ interfaces to perform zoning configuration, SRM tools provide an easy-to-use GUI for zone and fabric management. And, allocation of storage to servers (and the subsequent de-allocation) is generally not a difficult task, once the storage arrays and servers are fully discovered and managed. But there are limitations to this functionality, which will be discussed later.

The reporting capabilities of SRM products are generally quite useful for data-center management. Fully deployed and maintained SRM systems can provide very detailed information regarding storage usage. Their APIs also make this reporting data available for use by other systems, which is critically important in providing business meaning to storage infrastructure data. The leading SRM tools are able to feed business intelligence systems, which can correlate resource utilization data with topics of interest in the boardroom (e.g., who is using how much storage, how well, and where). By facilitating such data correlation beyond SRM, data-center managers are able to enable chargeback and accurate capacity planning.

One powerful yet often-ignored value of SRM is the ability to integrate monitoring of storage devices with common IT monitoring applications. The leading SRM tools, including EMC ControlCenter, Hitachi Data Systems’ HiCommand Storage Services Manager, IBM’s TotalStorage Productivity Center, and Symantec’s CommandCentral Storage, provide the ability to integrate into industry standard management frameworks via SNMP. This functionality helps to bring the storage network into the visibility of a traditional network operations center (NOC), allowing on-call personnel to act quickly in the event of storage environment disruption.

In addition to providing data on component failures, SRM tools can also provide detailed information on the objects affected by potential outages. SNMP traps, combined with basic SRM data views, can help NOC personnel to identify the cascading impact of an outage, thereby allowing them to take direct action to mitigate potential negative effects.

However, it is important to proceed with caution when embarking upon monitoring framework integration. Some devices require alert forwarding to a central SRM monitoring station, due to a lack of network connectivity. A growing number of storage devices, however, are now network-connected and can send alerts directly to a monitoring framework, so it is important to choose one alerting method to avoid duplication. Also, default alert settings within SRM tools can lead to alert storms that can literally bring the monitoring framework to its knees (not to mention those receiving the alerts). A more structured approach of initially disabling all alerts is generally recommended-and then enabling them gradually based on requirements.

What SRM doesn’t do well

As many IT managers have discovered, there are areas in which SRM tools do not excel. It’s true that storage vendors have gone to great lengths to provide rich and balanced data using the proper mix of agents. SRM vendors such as EMC, Hewlett-Packard, IBM, and Symantec have developed a wide range of agents to support storage arrays, NAS devices, SAN switches, all major operating systems, and specialized agents for databases and mail applications. However, there are still gaps in these products.

One serious weakness common to most SRM tools is the inability to report properly on clustered hosts-multiple hosts with access to the same storage devices. Typically, clustered storage is mistakenly counted multiple times, because each host in a cluster will report on the allocations of these shared array devices. Some SRM products have been enhanced to alleviate this issue, but this typically involves cumbersome processes or providing odd report views that cannot be easily rolled up into a single report.

Provisioning: Good, not great

SRM vendors have implemented configuration management and provisioning capabilities in varying levels of functionality. However, the SRM tools have not been written to support a large enterprise’s requirement for batching tasks. They often approach it in a relatively simplistic serial process that does not support multiple administrators; nor do they take into account the requirement to batch similar tasks together, such as zone changes. As such, the batching functionality currently offered cannot be used by large enterprises that have strict change control measures and processes with defined change windows.

Small to medium-sized organizations often have a less stringent process for storage environment changes and provisioning, often allowing changes throughout the business day. Larger enterprises generally avoid change during normal working hours, and with the business day now extending globally across time zones they advocate change only during defined windows, quite often restricted to weekends. The majority of SRM products with provisioning capabilities are not able to break up similar tasks and batch them accordingly to reduce change impact.

Many large IT shops have created script-driven processes that automate almost all of the provisioning process, such that the provisioning steps are all executed overnight or during weekend change windows. Such scripted solutions often allow for storage reservation by day, which prevents administrators from stepping on each other by reserving free volumes in real-time. SRM tools simply have not kept up with the need for automation, possibly because of the more pressing need for multi-vendor support. As SRM matures, vendors will need to provide more intelligent automation; IT shops that have developed their own automated provisioning methods are reluctant to abandon them in favor of increased manual activity through GUIs, even when presented with the many advantages SRM offers.

Stale data

Another major SRM concern is the difficulty involved with removing stale data from the application’s database. Many applications report on objects such as hosts, arrays, or switches until they are manually removed from the SRM tool. In cases where the devices are no longer reporting due to being decommissioned, or because of a missing agent, the data obtained during the last discovery will remain in the system. This condition can be very misleading, particularly in storage allocation reports. Hosts that have been retired yet still remain in the SRM database will look to the administrator as if they have previously assigned storage. If this same storage is subsequently provisioned to a new host, the storage will appear to be allocated to both hosts, which can lead to significant confusion when monthly storage reports are generated. Conversely, if retired hosts continue to appear as if they are consuming storage, managers may be led to believe that the environment is under-provisioned and feel compelled to purchase additional storage unnecessarily.

To help manage this situation, many SRM vendors are building new data fields into object reporting that indicate the most recent discovery dates for a given object. Using this information, customized reports can be created to list objects that are not updating the SRM tool with new information.

Agents: Barriers to adoption

Agent reliance is one of the primary problems with SRM tools: They present the most daunting obstacle to adoption. If executive management issues an edict that mandates agent deployment, and if agents are incorporated into standard server builds, then the associated pain can be lessened to some extent. That said, deploying agents on servers presents a big challenge. Many large corporations have made the decision to avoid SRM tools altogether, due in large part to the desire to avoid the internal battles over agent deployment.

SRM’s true power is realized when host agent deployment is thorough. While host data is not required by SRM tools, viewing storage and fabric information alone will produce a limited view of the infrastructure, especially for performance and utilization data. So, if a corporation wants to use an SRM solution to take advantage of the comprehensive, end-to-end functionality it can provide, host agents are a must; however, deploying agents on thousands of hosts can take months, so this is no small commitment. Once deployed, upgrading or adding secondary agents can often be automated through the SRM infrastructure, given proper agent code levels on each host.

Implementing and maintaining SRM products requires a great deal of effort. Although deployment efforts can be streamlined over time, maintaining code levels for deployed agents, infrastructure upgrades, new functionality, and hot fixes will continue to require dedicated attention. Ever-changing requirements for device compatibility will create situations where driver, firmware, microcode, or operating system versions must be strictly maintained to ensure seamless SRM system function. Without strict maintenance of driver, firmware, microcode, or operating system versions, full discovery of devices or hosts may be limited. Although this diligence is beneficial in many respects, IT managers normally prefer to direct such version management according to the needs of their businesses, not the needs of an SRM tool. For example, many companies have been eager to implement virtualization at the host, storage, and file levels. Many organizations went forward with these initiatives, but SRM products have been slow to support them completely. This created “blind spots” in SRM infrastructures.

With all the negatives associated with agents, some concerns about them have been overblown due to problems with early versions. As SRM tools were in the early adoption phase, agents often used excessive memory and processor resources, causing headaches for system administrators with on-call support responsibilities. As a result, many administrators have sworn off agents. While these problems were legitimate in the earlier stages of SRM tools, current host-based agents perform their tasks without consuming excessive host resources. Generally, this maturing of technology has removed agents as a barrier to SRM adoption, although there are still a few instances where some specialized agents for databases continue to be burdensome on hosts and administrators.

Also, major upgrades to SRM tools have required wholesale agent upgrades across an enterprise. Such upgrades may require manual visits to thousands of servers in a large environment, the prospect of which has served as a significant de-motivator for SRM adoption. It is important to consider, however, that such upgrades are relatively rare and will become rarer as SRM products mature. Additionally, the process of SRM agent upgrades can be performed on a large scale in an orderly fashion, given adequate planning, clear instructions, unwavering management commitment, and time.

Despite these advances, vendors still persist in releasing upgrades to the SRM infrastructure that require agent restarts, especially during major release upgrades. This is particularly intrusive and disruptive when an SRM upgrade requires the restart of thousands of host agents. SRM vendors should reconsider their methods for large-scale deployment and automation of agents. More effort needs to be applied to making the agents intelligently aware of devices, facilitating the loading of additional agent functionality where relevant, rather than requiring manual intervention. At the same time, the SRM infrastructure needs to be more self- maintaining.

Part 2 in this series of articles focuses on SRM challenges, including multi-vendor support, homegrown versus vendor tools, and the trend toward SRM suites.

John Echaniz is director of client solutions with Novus Consulting Group (NovusCG-www.novuscg.com), and Justin Schnauder is a technologist with NovusCG. David Askew, client technology executive at the company, also contributed to this article.

This article was originally published on June 01, 2007