Take a new look at data protection

A wide variety of technologies exist to help you achieve the appropriate degree, or level, of data protection.

By David Hill

Data protection is at the top of any list of storage management issues and is a cornerstone of an organization’s risk management. Data protection mitigates the risks associated with data loss or damage on either a temporary or permanent basis.

To deliver satisfactory levels of data protection, enterprises have to understand the overall “data-protection infrastructure portfolio” into which individual data-protection technologies fit. Otherwise, what appear to be individually sound decisions may not lead to the necessary levels of data protection.

Without the right framework, enterprises cannot know where to place their longer-term data-protection technology investment bets or how much they should place on each bet. And that means that any framework has to take into account the changing world of data-protection technology.

The role of DP in BC

Risk management is a key responsibility of any enterprise, and business continuity is a key function within risk management. Business continuity is the mitigation of risk caused by interruption to enterprise activities and processes.

A key task of any business continuity strategy is data protection and, conversely, a key aim of data protection is business continuity. Furthermore, a business continuity strategy and architecture can serve as a good framework into which to fit data-protection technologies and strategies. It is comprehensive, ensures that the needs of other parts of the infrastructure aside from storage are taken into account, and fully recognizes the crucial role of storage.

To understand why enterprises may not be receiving the level of data protection that they think they are requires an understanding that business continuity is not only about disaster continuity, but also operational continuity-the ability to deal with day-to-day operational problems.

Both operational and disaster continuity require the proper level of both physical (storage device level) and logical (the data itself) data protection. A data item may be flawed although the disk is functioning perfectly; conversely, a disk may crash but the data may be preserved on a different disk.

Note that an event is considered a “disaster” only when data processing has to be moved from a primary to a secondary site and when that processing is carried out using a different set of computer hardware (including both servers and storage).

Operational continuity and disaster continuity need a different mix of data-protection technologies to achieve the required levels of data protection. Yet enterprises may not have a clear understanding of the differences between physical and logical data protection. A logical data-protection problem can affect a key application, whether or not the application crashes.

Click here to enlarge image

The target for an IT organization starts with four simple boxes (see table, above). Both operational continuity and disaster continuity have physical and logical components. Each box has to be considered individually, and all four boxes have to be considered collectively to devise a data-protection solution that meets an enterprise’s requirements.

Although it seems simple, filling in the matrix is not that easy. The first challenge is knowing when the levels of data protection are enough. The second challenge is understanding that the target is moving and knowing how that will affect what needs to go into the matrix to achieve the right levels of data protection.

Data-protection objectives

The goal is to have data always available securely, with optimal performance, to authorized users anywhere via any connection on any device. Availability is certainly critical to obtaining that goal, but from a data-protection perspective there are really four objectives that have to be met:

  • Data preservation-Data must be consistent and accurate all the time;
  • Data availability-For instance, the ability of I/O requests to reach a storage device;
  • Data responsiveness-The ability to deliver data to an authorized user according to measures of timeliness that are deemed appropriate for an application; and
  • Data confidentiality-Data is available only to authorized users.

Note that data availability is not the same as data preservation. Not all preserved data needs to be immediately accessible. It may take a month to get some historical records back from the tape warehouse for discovery during a legal proceeding, but a month is adequate time. Not all data that needs to be accessed quickly for business intelligence needs to be preserved-in some cases, financials can be quickly reconstituted from sales and other data if the financial spreadsheet is lost.

Job one in data protection is the preservation of digital assets. A recovery point objective (RPO) states the amount of time back to where a recovery is attempted and specifies what the acceptable level of data loss is (in seconds, minutes, hours, or days). RPO should be negotiated between users and the IT group. Recovery-time objective (RTO) is the time that it takes to restore an application to an operational state.

Degrees of data protection

Data protection comes in degrees (and can also be thought of in terms of layers). The first degree where data protection can be provided is for the primary copy. The primary copy may or may not have data protection. If it does, then that is the first line of defense for operational continuity. Built-in data protection of the primary data copy can mitigate service-level threatening events (such as a single disk failure).

However, this level of data protection cannot provide disaster continuity protection and the risk protection diversification that is necessary for operational continuity protection. At least one add-on copy-a full copy of the data that is physically separate and distinct from the original-is necessary.

An additional degree of data protection means that one failure is tolerable, because data is recoverable. If a failure should occur, data protection is at zero degrees. Zero degrees of data protection means no more failures can be accommodated without total and permanent data loss. This is a level of exposure that is unacceptable.

That is why additional degrees of data protection are necessary (see table, below). The question is: How many? The minimum number of layers is two. If one failure occurs, the degrees of protection are down to one. Given that technology is not perfect, having only one extra degree to fall back on is not advised. So three degrees of data protection is probably a minimum. Each additional layer adds expense, but one or more additional layers may still justify that expense.

IT administrators should map out the degrees of data protection for each application. The degrees of protection have to be split between higher-availability and lower-availability degrees. Once the higher-availability degrees are exhausted, availability depends upon the lower degree availability options. Note that the term “lower availability” should not be considered a pejorative term; it just reflects the relative difference between the time-based ability of different technologies to restore information.

Click here to enlarge image

Critical to understanding information life-cycle management (ILM) is that every piece of data becomes fixed (i.e., read-only) at some time during its life cycle. Active changeable data reflects a creation and change process where viewing the data at different times would reveal that the data had not stayed the same. At some time, change ends. Even an online transaction processing (OLTP) system updating customer records creates data that must be “frozen” after a certain period of time. An e-mail is information that is fixed upon capture (as replies do not change the e-mail itself). In most cases, a large percentage of an organization’s data is fixed.

IT organizations generally understand the concept of the bifurcation of production data into two separate and distinct classes-active changeable data and fixed content data. The storage of fixed content is often referred to as “active archiving.”

ILM fundamentally divides the storage infrastructure into active changeable data and active archive data. The addition of an active archive for fixed-content information changes the data-protection category matrix (see table, above). The reason is that some of the data-protection strategies are different for active archived information than for active changeable information.

Of course, an application may have both active changeable and active archive data. That might mean that the RPO and RTO for each data type would be different. For example, active transactions in an OLTP application may require different (and probably more stringent) RPO and RTO than closed transactions that are retained for business intelligence purposes.

Click here to enlarge image

The finer granularity that is expressed in the doubling of cells in the matrix requires more work on the part of IT administrators, but it leads to more-effective data-protection strategies.

There are a variety of technologies that can fit into an ILM version of the data-protection framework (see table, above). No single taxonomy for data protection seems suitable. Data-protection technologies are not always purely for one task or function; there may be a lot of blending, blurring, and variations in the data-protection functionality that any individual product may contain. The focus here is therefore on overall technologies and not specific products. Individual products can then be evaluated in terms of how they fit one or more of the data-protection needs.

Click here to enlarge image

The table to the right lists data-protection technologies in various categories.

Business as usual for data protection is not an option for IT organizations. For each application, IT organizations have to figure out their data-protection objectives; necessary degrees of data protection; and the data-protection technologies that can best be used to fill each box in the data-protection category framework.

That is a real challenge, but the rewards-attaining the proper level of data protection, wisely using scarce budget dollars, and getting the best out of everyone who has a role in data protection-are worth the effort.

David Hill is a principal with Mesabi Group LLC, which is affiliated with Valley View Ventures. This article was excerpted from a longer report, Data Protection: Adapting to the Sea Change. For information on the full report, visit www.valleyviewventures.com (New Analysis, Mesabi Group).

This article was originally published on April 01, 2005