The goal is to ensure complete and error-free backup-and-recovery operations.
By Liam Scanlan and Mark Silverman
Organizations spend millions of dollars on data backup systems and subsystems and millions more each year attempting to manage these systems to ensure the availability of mission-critical data. Yet despite these massive investments, the typical organization can only make an educated guess as to whether backup operations are actually protecting critical data, let alone meeting reliability and efficiency objectives. This is becoming a major liability for corporations that must satisfy increasingly risk adverse auditors, insurance companies, risk managers, and government agencies.
According to end-user surveys, the number-one challenge facing managers of backup operations is determining whether their backups succeeded. And if data is not backed up, it cannot be recovered.
This article defines the concept of "storage intelligence" and explores how it can help an organization establish, measure, and proactively manage backup operations against the required reliability and efficiency policies.
What is storage intelligence?
Storage intelligence is the collection, consolidation, analysis, and presentation of all of the information required to ensure data storage operations reliably and efficiently comply with enterprise policies. At the core of storage intelligence is the collection of "metadata," which is operational and system data from backup and other critical data storage systems.
Metadata helps to determine whether each backup job is occurring correctly and to identify how much data is being backed up. It also identifies reliability and performance issues and root causes, provides an auditable track record of all backup operations, identifies and quantifies resources used, and bills for services.
The typical enterprise backs up numerous applications and files residing on a mix of operating systems. Enterprises also back up to multiple tape devices, using backup software from multiple vendors. Few vendors have developed open, interoperable products, and few interoperability standards exist.
The result is a complex network of heterogeneous backup technologies. Any one of these technologies alone is a management challenge. However, when integrated together, they produce an environment that quickly degrades without proactive day-to-day management. This is one of the key reasons why the radical decrease in hardware cost-per-gigabyte has not led to a corresponding decrease in management costs. In fact, for many organizations, it has led to even greater administrative costs while reducing the likelihood of being able to recover critical data.
Applying storage intelligence to backup
Applying storage intelligence principles to backup operations first requires the identification of the problems that need to be solved. The next step is to determine the metadata required to solve the problems, as well as the location of the metadata in the storage network. Initially identifying the problem will also help to determine how to consolidate, analyze, and present the metadata once it has been collected.
Determining and improving backup reliability
The Enterprise Storage Group estimates that 40% to 60% of all backups fail in network environments.
To understand the reliability of backup operations, storage administrators need to collect metadata detailing
- Each backup job attempted;
- Each protected resource ("client") for which a backup job was not attempted;
- Each backup failure that occurred at the server, client, path, or job level; and
- Error messages related to each failure. It is also helpful to consolidate this information across all servers, by server, client, target, and/or groups, to analyze and identify trends.
Ultimately, storage administrators should do more than simply identify failed backups and run redundant backups. As backup errors are detected, the root causes should be identified, categorized, and corrected. For example, many recurrent backup errors result from either backup server or network configuration problems. To correct the errors, storage administrators need to review the history of failures and causes and then create reports of backup errors from the metadata. Generally, these reports will highlight information (e.g., failure patterns and trends) not immediately obvious in log files or activity logs. By eliminating systemic backup errors, the risk of encountering unrecoverable files will be significantly reduced.
Measuring backup resource utilization
An essential part of any backup strategy is an accurate assessment of backup resources, the source and quantity of backup demand, and the efficiency with which those resources are being used. Accurate collection and analysis of select metadata can help protect against the risk of misallocated and mismanaged backup resources, as well as against inefficient configuration of software, hardware, and networks. It can further help to predict future requirements, better manage backup windows (through load balancing on servers and identification of bottlenecks), and enhance the overall efficiency of backups.
The metadata to be collected should include
- Data volumes by backup server, client, and target;
- Duration of backup jobs;
- Resources utilized to complete each job; and
- "Owners" of clients and targets.
Application-specific data as well as additional media and network statistics can also be useful.
Even without additional analysis, this metadata will allow storage administrators to know exactly how much data is being backed up during specific periods of time.
This information is critical when planning backup strategies and determining future resource requirements. Yet surprisingly few organizations can pinpoint the exact amount of data being backed up at a given time, let alone over periods of time.
Maintaining auditable records
Identifying and managing risks and costs are constant challenges for IT departments. Maintaining an independently collected, auditable track record of all backup jobs attempted and the success rates of each job provides managers with the ability to prove compliance with audit and legal requirements. This removes speculation about whether the backup systems that were implemented to protect key corporate assets are actually being used for and are accomplishing their intended purpose.
Furthermore, the auditable metadata enables senior managers to identify, and measure and allocate cost to, the sources of demand for backup services. Difficult budgetary decisions will become less arbitrary and more tied to quantitative business requirements. In the past, a manager might plead for an increased budget based on uncontrolled growth in demand for backup resources. With a record of backup jobs tied to sources, managers can show quantitatively the drivers of the increased demand for services. With quantitative analysis replacing educated guess work, those approving the budget are enabled to make policy decisions.
An auditable track record is not only helpful in justifying budgets and establishing backup policies. It is rapidly becoming a requirement as audit firms, government agencies, insurance companies, risk managers, etc., adopt stricter measures of corporate policy enforcement, risk management, and liability disclosure. Audit approvals, debt ratings, insurance premiums, etc., depend on proof of the existence of—and compliance with—data-protection policies.
Implementing storage intelligence
Applying storage intelligence principles to backup operations is simple in concept:
- Define the problem to be solved;
- Identify the metadata necessary to solve the problem;
- Collect the metadata from one or more aggregation points (e.g., backup server, tape library, etc.); and
- Analyze and present the metadata in a format that helps to pinpoint and resolve the problem.
Unfortunately, the information vital to measuring performance of backup operations is often buried within vast log files or databases and is often mixed with massive amounts of non-critical metadata. Implementing a storage intelligence solution requires sifting and consolidation of vast quantities of backup metadata. Manually collecting and consolidating these records is a tedious and time-consuming task that can take hours for even a single backup server managing only a handful of clients and targets. In larger environments with multiple servers, operating systems, and storage architectures, manual collection and consolidation of metadata is often impractical.
However, this difficulty should not tempt storage administrators to leave data-protection efforts to chance. It merely means that unless the environment is simple, an automated process that collects, consolidates, analyzes, and presents all backup information is required.
There are three options: buy a complete solution that meets all or most backup requirements; internally develop such a solution; or create a series of scripts for critical aspects of an environment until the first two options can be implemented.
Storage intelligence applied to backup operations helps organizations establish, measure, and proactively manage backup operations against enterprise storage policies. Even a basic storage intelligence solution will substantially reduce storage administrative costs while reducing the risk of lost and unrecoverable data.
Liam Scanlan is vice president of product development, and Mark Silverman is president and CEO, at Bocada Inc. (www.bocada.com) in Bellevue, WA.