By David G. Hill
Data protection is a must-do-well responsibility for all IT organizations. Even the temporary loss of availability of data to key applications may have severe consequences, ranging from loss of revenue to reduced productivity, and the permanent loss of key data is unthinkable. As such, IT organizations invest a considerable amount of money in data-protection technologies.
Although the data-protection industry is changing rapidly, IT organizations still depend on traditional backup/restore software processes as the backbone of their data-protection strategies. And the fact that a virtual tape library (VTL) may front-end a physical tape library does not alter the basic backup/restore processes.
Yet in a large number of organizations these processes are at best frayed and at worst broken. To determine how well-protected their data is, companies must answer two questions about the state of their backup/restore processes:
■ Can IT guarantee that all data that needs to be restored after a production data loss is backed up all the time?
■ Can IT guarantee that all data that needs to be restored will be able to be restored in a timely manner that is consistent with the capabilities of the data-protection technologies that are used?
If the answer to both questions is not an unequivocal “yes,” then the data-protection processes are not delivering the necessary level of service.
The blame game
Where does the fault lie if the answers to the two questions are unacceptable? Some storage professionals direct blame at tape technologies in general, yet we believe that blame is misplaced. Even if a VTL is used to complement a physical tape library, that does not solve the problem. Performance, reliability, and manageability issues may be topics for discussion (and a VTL may very well provide benefits in those areas), but the main source of the problem is further upstream. All tape (and complementary disk-based) technologies do is serve as targets for backups and sources for restores. They do not affect fundamental data-protection processes.
Can blame then be placed on the backup/restore software? Generally, the answer is “no,” because most backup/restore software is very robust. Issues such as performance and manageability may differentiate backup/restore software, but the process culprit is typically not the software.
The real culprit for poorly performing data-protection processes, primarily backup/restore, is complexity. One simple source of complexity is the never-ending growth of data. When backup/restore processes have been optimized performance-wise, if more data is added to a backup job it logically takes longer to run. Furthermore, requirements to keep applications up longer shorten the time available to do backups. Consequently, delays due to restarting a failed backup, or coping with network congestion that prevents a running backup job from completing in the allocated time, have to be addressed as quickly as possible. The tradeoff between having an application up that is not fully protected, or having unplanned downtime for an application while a backup job runs, is unacceptable.
Another cause of complexity is mixing heterogeneous products (more than one type of backup/restore software, operating system, storage hardware, etc.).
Risk management responsibility should be enough incentive to inspire enterprises to deal effectively with data-protection process complexity. Yet reluctance to change the historical ways of handling backup/restore, as well as cost pressures on IT organizations, may have dulled active responses to the complexity problem. After all, the base objective is to provide a satisfactory level of data protection while minimizing costs. As long as the data-protection processes can limp along at a minimal level of adequacy, many IT managers are willing to turn their time and attention to more-pressing matters.
The demand for compliance by government and other regulatory agencies adds extra incentive to optimize data-protection processes. The objective is to ensure data protection is performed completely and accurately.
The Sarbanes-Oxley Act requires publicly traded companies to monitor and verify the authenticity of financial records. CEOs and CFOs of such companies are held personally accountable to the extent that their personal freedom could be taken away.
Sarbanes-Oxley requires that there be no destruction, alteration, or falsification of financial records. Each company must establish and maintain an adequate internal control structure and procedures for financial reporting. That includes an assessment of the effectiveness of those procedures. Data protection performs a key role in the internal control structure. Compliance requires enterprises to be able to completely and accurately recover all required data in the event of a logical or physical failure. No loss of data that would invalidate the integrity of the financial reports is acceptable. That means that IT has to be able to give an unequivocal “yes” to the two questions asked earlier about data protection and restore capabilities.
Many organizations may feel that they are not subject to regulatory compliance or, if they are, that by meeting the letter of the regulatory law that they have done all that they need to do. That is a dangerous misconception.
Enterprises need to put in place the necessary self-discipline to make sure they can deliver the proper level of risk management service, say for litigation support, as part of the self-regulation of a trade association, or simply as good business practice. For example, an enterprise should have a uniform policy for retention (and destruction as appropriate) of all e-mails. All e-mails should be available quickly with completeness and accuracy that can be attested to by an outside auditor. Performing eDiscovery quickly at a judge’s request is a lot better than responding to a discovery order and having to dredge up and examine all tapes-a time-consuming and expensive process. Failure to find information that should be there can be very expensive-as numerous firms have discovered.
Data protection used to focus only on internal needs, such as returning a down application to working status as quickly as possible. (Even though the customers who use the system may be external, the service level objectives, such as recovery point objective and recovery time objective, are internal.) With compliance, the focus is on presenting an image of the enterprise to the outside world. Since failure to comply implies an inefficiently or feloniously run organization, that ups the ante for companies to manage their data-protection infrastructure more effectively.
To address the problems associated with data-protection processes, many companies need additional software tools. Data protection management (DPM) is the name for a category of products that help manage data-protection environments. DPM products do not handle data protection per se, but they do enable better management of the data-protection processes and products, including backup/restore software, continuous data protection (CDP) appliances, and other elements of the IT infrastructure related to the data-protection “ecosystem.”
The word ecosystem implies that there are interrelationships among the various components of the IT infrastructure, including servers, networks, storage, applications, operating systems, file systems, and databases.
For example, if a network is congested and backup traffic cannot traverse the network in the allocated time, then a backup job may not be able to complete within the planned backup time window.
This example illustrates the need for IT managers to have both timely and actionable information. Information must be timely to either prevent service-level-impacting events or, failing that, minimize the damage of the events that have already occurred. Actionable means that the problem can be alleviated-either on a one-time basis or permanently. DPM delivers the reporting, monitoring, and troubleshooting capabilities that IT needs to manage data-protection processes more effectively.
Issues that data-protection management products can address include the following:
Ensure completeness of data-protection coverage: For example, by determining if any servers have not been backed up successfully and whether there are any servers for which backup has not been attempted at all.
Carry out long-term backup window problem analysis: Performing a pattern analysis (e.g., determine from historical information the slowest, fastest, and most-unreliable components of the data-protection infrastructure) to see if there are any systemic issues, such as repeating problems or bottlenecks, that need to be addressed.
Perform preventive maintenance through predictive analysis to prevent unnecessary negative service-level impacts: Using historical information for trend analysis to determine when elements of the data-protection environment will exceed a predetermined threshold, such as when pieces of tape media will run out.
Speed up response to real-time data-protection problems: Facilitating the troubleshooting process to identify and rectify potential or actual data-protection service-level impacting events.
The vendors that focus on data-protection management are typically small, relatively new companies (see table), many of which have relationships with larger vendors. For example, Aptare has a strategic partnership with Hitachi Data Systems (HDS). Bocada’s partners include CA, EMC/Legato, Hewlett-Packard, IBM/Tivoli, Network Appliance, Sun, and Symantec. Illuminator is a NetApp Advantage Partner. Servergraph counts IBM/Tivoli and Symantec among its partners. Tek-Tools has technology partnerships with Brocade, CA, EMC, and NetApp. And WysDM claims EMC among its channel partners, and NetApp, Oracle, and Sun as technology partners.
If the need for data-protection management were so critical for so long, why was it ignored prior to the DPM start-ups entering the market? One answer is that IT organizations expected their backup/recovery software vendors to do the job. Although those vendors provide help in homogeneous environments, they typically don’t address heterogeneous environments.
A second answer is that backups might be best viewed as application silos rather than as an overall service function. A third answer might be that until recently the pressures of continued data growth and the resulting complexity had not yet reached a critical stage. But whatever the reason, these are now mere excuses that no longer matter.
Monitoring and reporting is the first step in achieving efficient data-protection management, and making use of DPM capabilities should now be a priority for most storage administrators.
David G. Hill is a principal with The Mesabi Group LLC (www.mesabigroup.com), a consulting firm specializing in storage, storage management, and related IT infrastructure issues. A version of this article originally appeared in the Pund-IT Review newsletter (www.pund-it.com).