Disk-based data protection: Trends and forecasts

Posted on February 01, 2007


Enterprises are shifting to disk as the first line of data protection, and adoption of virtual tape libraries (VTLs) and continuous data protection (CDP) is on the rise.

By Stephanie Balaouras

According to interviews with enterprise end users, data growth, coupled with shrinking backup windows, is the main driver behind adoption of disk-based data protection. Because disk provides significantly faster backups and restores than tape, enterprises use disk as a staging area, or “cache,” before data is vaulted to tape. Disk is also used as the backup target for mission-critical applications that have a low tolerance for downtime and require almost immediate data restore. As a result, the adoption of disk-based backup and recovery is linked to the following key factors:

Annual growth of production data will continue at 20% to 30% for the next five years. This is due primarily to 1) the continued growth in traditional business processing applications such as enterprise resource planning (ERP) and customer resource management (CRM), arising from new business and service offerings; 2) the creation of rich digital content such as images, sound, and video; and 3) increasing reliance on integrated communication (e.g., e-mail, IM, and voice) and collaboration applications.

The annual growth of messaging and collaboration data will outpace all other data sets. An increasing number of enterprises deploy unified communication systems where e-mail, IM, and voicemail services are supported in the same application. At the same time, more and more enterprises are using collaboration and content management applications such as Microsoft’s SharePoint and EMC’s Documentum to create, share, and manage rich digital content. The mix of database, messaging, and collaboration applications, and unmanaged, unstructured file data has a significant impact on the selection of disk-based data-protection technology.

60% of backup will be on disk by 2011. Forrester Research currently estimates that 29% of backups are to disk and 71% to tape. In two years, we estimate that this will shift to 43% disk and 57% tape. As the cost of disk continues to decline and it becomes more cost-effective due to advanced capabilities such as data de-duplication (which drastically improves capacity use), enterprises will shift more and more of their data protection to disk and increase the time that data is stored on disk before it is ultimately vaulted to tape.

Choosing the ‘right’ technology

Enterprises have numerous disk-based data-protection technologies from which to choose. The “right” solution depends on multiple criteria, including 1) recovery-point objectives (RPOs) and recovery-time objectives (RTOs) for what needs to be protected (database, messaging, and collaboration application, or unstructured file data); 2) the restore granularity requirements (i.e., image or object level); 3) the disruptiveness of the technology to the existing backup environment; 4) the manageability of the technology (Is it a stand-alone technology, or is it managed from within existing backup applications?); and 4) the cost of the technology, including any licensing costs and associated disk costs. It should also be noted that enterprises are likely to select several of these disk-based technologies for “tiered” data protection.

Our research has revealed a number of interesting trends:

Penetration of backup to conventional disk has already peaked. Forrester estimates that 67% of enterprises already perform backups to a conventional disk target. Forrester attributes this high adoption rate to the fact that traditional mainstream backup applications have long supported backup to disk targets, and early adopters were eager for any kind of solution that could help them meet their backup windows. However, Forrester believes that adoption of virtual tape libraries (VTLs) will increase at the expense of backup to conventional disk in the future, because VTLs are far less disruptive to introduce, have performance and flexibility advantages over conventional disk, and maximize existing investments in tape. VTLs also have the ability to morph into disk-only backup appliances in the future.

VTL adoption is poised for growth. Forrester estimates that 30% of enterprises have already adopted VTLs. [Editor’s note: In an InfoStor QuickVote reader survey, 37% of the respondents had already implemented VTLs, another 21% plan to do so this year, while 42% had no plans to implement VTLs (see figure). Forrester estimates that the adoption of VTLs will increase by as much as 20% in five years. For those enterprises that continue to use a mix of disk and tape in their backup environment, a VTL is the preferred backup target due to its non-disruptiveness, flexibility, manageability, and ability to facilitate physical tape creation. There is typically a direct correlation between an enterprise’s reliance on tape and adoption, or intent to adopt, VTLs. Because the majority of enterprises express a desire to continue to use both disk and tape in their backup environments, VTLs are here to stay.

Rotating snapshots have the broadest appeal. Snapshots have come a long way since the days when they integrated with databases and backup applications via scripts. Today, snapshots are configured and managed via Web-based user interfaces from the storage vendors, or from mainstream backup applications, and are tightly integrated with database and messaging applications. Snapshots can be full clones or space-efficient snaps and can be used in rapid succession to offer multiple recovery points. While snapshots don’t provide the literal “any-point-in-time recovery” of continuous data protection (CDP), they are space-efficient and provide recovery at multiple, consistent points in time. Forrester believes enterprises will select snapshots as the primary protection and recovery method for databases, as well as messaging and collaboration applications. Forrester estimates adoption of snapshot technology at 37% today and expects this to increase to almost 60% by 2011.

“Near-CDP” offerings are similar to snapshots. There are a number of vendors that market “near-CDP” products that don’t technically meet the Storage Networking Industry Association (SNIA) definition of providing any-point-in-time recovery. (SNIA defines CDP as “a methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery from any point in the past.”) However, near-CDP products do offer recovery from multiple points in time by using successive snapshots and replication. This approach reduces processing overhead on production servers, will meet most RTO and RPO requirements, and is increasingly a feature of mainstream backup applications. Thus, while it is not CDP as defined by SNIA, Forrester believes this is still an advantageous approach. For the purposes of this forecast, near-CDP is included in the forecast for snapshots.

CDP will be used to protect the most mission-critical data. There are several different types of CDP products that provide any-point-in-time recovery from disk. They differ in the methods they use to track continuously; their awareness and integration with applications, databases, and file systems; and their recovery object granularity. As a rule of thumb, because CDP continuously tracks all changes to data, the additional storage requirements can be quite significant. For example, if you assume a 10% rate of change to a given data set and you want to recover to any point in time in the last two weeks, you’ll need as much as 1.5x additional storage capacity as the amount of data you’re trying to protect. For this reason, Forrester believes enterprises will use CDP offerings to protect just their most mission-critical databases and messaging applications. Forrester estimates that current CDP adoption is at approximately 20% and will increase to 35% by 2011.

It’s important to note the CDP products may track data changes at several different levels: block, file, or through application-specific integration. Similarly, the most granular objects they recover can range from individual files, mailboxes, or even messages.

Usage scenarios

Based on their RTO, RPO, and recovery granularity, different disk-based data-protection technologies are more or less appropriate for databases versus messaging and collaboration applications versus unstructured file data. Based on our enterprise end-user interviews, Forrester currently estimates that:

Databases support the most mission-critical business processes. Business processing applications such as ERP, CRM, and software configuration management (SCM), which run on databases such as DB2, Oracle, SQL Server, and Sybase, typically support the most mission-critical business operations in an enterprise. Due to their criticality and complexity, the right disk-based data-protection technology must eliminate the backup window, reduce downtime, and limit data loss. This means understanding key database elements such as logs, tablespaces, and database files, as well as ensuring the selection of a recovery point that represents a consistent, “restartable” database image. The two most appropriate solutions are rotating snapshots and CDP.

With snapshot technology, enterprises typically take at least one snapshot per day for snapshot-assisted backup to tape as well as instant restore. As snapshots have become much more space-efficient, many enterprises now implement a rotating snapshot strategy where multiple snapshots are taken throughout the day to offer several recovery points. Forrester expects that most enterprises will protect their databases with rotating snapshots and use CDP for only the most-critical applications due to the additional space requirements.

Messaging applications are becoming more critical and require object-level granularity. As companies rely on them more heavily, messaging applications such as Exchange are increasingly seen as business-critical and, in some cases, even mission-critical.

Not only must messaging applications be backed up regularly because of their criticality, but they also require object-level restore capabilities (i.e., the ability to restore an individual mailbox or e-mail). For this reason, most enterprises will use backup to a disk target (conventional disk or VTL) for these applications. Mainstream backup software applications have long supported object-level granularity (and will continue to make improvements in this ability), and disk provides much faster backup and restore than can be achieved with tape.

Backup to VTL will cannibalize the deployments of backup to conventional disk for those enterprises that will use both disk and tape for data protection. For the most mission-critical messaging applications, or for the ones that have become so large that backup windows cannot be met with even backup to a disk target, enterprises will use snapshots or near-CDP technology in the same way they use snapshots to protect databases: successive snapshots to offer multiple recovery points and snapshot-assisted backup to tape.

End-user file data. End-user file data is typically far less critical than messaging or database applications but still needs to be protected on a regular basis and requires the ability to perform file-level restores. To protect data that is stored on file servers, Web servers, and NAS filers, enterprises will likely deploy a mainstream backup software application with a backup to disk target (conventional disk or VTL). Once again, backup to VTL will cannibalize the deployments of backup to conventional disk for those enterprises that will use both disk and tape for protection.

Market forecast

It’s important to note that Forrester’s data-protection market forecast focuses on how much data will be protected with some kind of disk-based data protection as a first line of defense. It does not forecast how much additional disk will be required to protect this data. How much additional capacity will be required depends on several factors. For CDP, it will depend on the rate of change and how far into the past the enterprise requires any-point-in-time recovery. For snapshots, it will depend on whether the enterprise uses space-efficient snapshots or full clones and how many snaps are taken daily. For traditional backup, it will depend on whether the enterprise continues to use a “grandfather-father-son” type of backup schedule or takes advantage of synthetic backups. In addition, advanced technologies such as data de-duplication could drastically reduce the amount of storage that’s actually required to protect a given application or data set.

North America has the most data to protect. In 2006, Forrester estimates the average capacity of North American enterprises as almost 59TB-significantly more than both Europe and the Asia-Pacific region. Large capacities, growth, and significant investments in existing backup technologies mean that North American enterprises will invest in all types of disk-based data-protection solutions, from VTLs (for their non-disruptiveness) to CDP (for any-point-in-time recovery), as they transition to disk for their first line of data protection.

European companies will adopt CDP at a slower rate, initially. Europe has the lowest penetration for all the disk-based data-protection solutions. Adoption will certainly increase, but it is likely to accelerate first with VTLs, followed by snapshots and, ultimately, CDP.

Companies in the Asia-Pacific region will adopt advanced disk-based data protection at a faster rate.VTLs are the most appealing for enterprises that have significant investment in their existing backup environment and tape. Enterprises in emerging Asia-Pacific markets that rely less on tape can forgo a VTL in favor of advanced disk-only data-protection offerings, such as CDP and snapshots.

Stephanie Balaouras is a senior analyst with Forrester Research (www.forrester.com).

