“…It takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!”
-Lewis Carroll, Through the Looking Glass
By Mark Ferelli
It wasn’t too long ago that data was measured most frequently by the gigabyte, and terabyte-level installations were found only in the largest data centers. Terabytes still look like a lot-1012 bytes or the equivalent of 50,000 trees made into paper pages and printed.
Data growth has skewed all the old measures. The largest data centers measure their capacities in petabytes now. E-commerce and information-intensive business operations have added massive amounts of data in the years since 1993, when EMC announced the first 1TB disk array. And plenty more is coming.
IT managers have to deal not only with the problems of capacity growth and management, but also with data availability. Organizations are increasingly setting business operating goals of 100% uptime, where data is always online and accessible. The often-repeated mantra for business operation and continuity is: Shorten the time to data!
“Users rely on the availability of data to run their businesses 24×7,” says Ravi Thota, director of product marketing at Network Appliance, “and rapid backup and recovery is critical to ensure business operations, customer service, and employee productivity.”
Shortening the time to data is a non-trivial undertaking. It requires a high level of cooperation between storage hardware and storage management software. It also requires serious tactical thinking about the data. Some stored data is in fact mission-critical and must be available 24×7. But some data is only important enough to justify 12×7 availability, or perhaps even 8×5. Decisions on what is mission-critical, what is important, and what is useful but not crucial need to be made to avoid wasting disk space.
The cost per megabyte of hard disks can tempt a storage administrator to waste disk space by treating data as more or less equal. This temptation is especially strong with low-cost, high-capacity Serial ATA-based disk arrays. But data is neither created nor used equally. The value of data changes as a function of time, business priority, or application.
It is against this backdrop-capacity management challenges, data growth, constant deltas in the value of data, high-availability requirements-that disk-to-disk (D2D) backup and recovery has matured from a concept to being a commonplace IT tool (see figure).
In disk-based backup, the primary backup is written to a disk resource rather than a tape device. Due to the random access nature of hard disks, this backup can be easily copied, migrated, or cloned. More importantly, though, is the ability to recover data quickly. Too many backup operations focus on the backup phase, while the business value of the backup is in the fast and accurate recoverability of the data. The key metric here is recovery time objective (RTO).
Topologies for D2D vary widely. At the most basic level, a disk array is inserted in the data path as a type of buffer or cache between primary disk storage and a tape archive. This is a basic extension of the common disk-to-tape data-protection strategy.
The application of journaling concepts to file systems forms the foundation of continuous data protection (CDP) as a D2D strategy. In a CDP solution, each write operation to a selected volume is captured and duplicated. The operation is recorded on a transaction log, which is used both for an audit tool as well as a baseline to re-create an image of a volume as it existed at a given point in time.
Representative CDP vendors include Asempra, EMC, FalconStor, FilesX, Hewlett-Packard, IBM/Tivoli, InMage, Kashya, Lasso Logic, LiveVault, Mendocino, Microsoft, Mimosa Systems, Revivio, Storactive, Symantec, TimeSpring, and XOsoft. (Note: This list includes both CDP and “near-CDP” vendors.)
Virtual tape libraries (VTLs) are another alternative to conventional D2D. VTLs operate by emulating an installation’s tape libraries while actually backing up to disk. This approach ensures backup scripts and software do not need to be changed to improve backup-and-restore times.
Examples of VTL vendors include ADIC, Bus-Tech, Copan, Diligent, EMC, FalconStor, IBM, MaXXan, Neartek, Overland Storage, Quantum, Sepaton, StorageTek, and Ultera.
The need for speed
The controversy over whether D2D backup and recovery can replace tape is the stuff that pitched political and marketing battles are made of. The battlefields include cost, comparative reliability of disk versus tape, and the need for speed in restoring data.
D2D proponents point out that tape is slower than disk and offers uneven reliability on restore. They also contend that disk, being a random access medium, shortens time to data. And in a business environment demanding virtually instant access to data, this speed advantage moves from being convenient to being necessary.
“Backup to disk is fast and reliable, and recovery of data from disk is near-instantaneous,” says Network Appliance’s Thota. “When backup data is available online in native file formats it can be used for business purposes such as reporting and testing. In addition, disk-based backup leverages software smarts to reduce media consumption and associated costs. Most importantly, backups on disk enable users to meet their service level agreements while minimizing the cost, complexity, and unreliability associated with backup and recovery.”
Still, widespread installation of tape-based solutions creates a budgetary challenge to D2D proponents pushing wholesale replacement of tape. Few data centers are prepared to obsolete an existing tape infrastructure investment, even with the promise of a significant improvement in total cost of ownership.
Optimizing storage hardware and software investments is best accomplished using disk and tape in tandem. Disk, with its speed and flexibility, makes sense for backups where rapid recovery is a priority. Tape, with cost and media removability advantages, can be leveraged for longer-term archival applications where the need for speed is less pronounced.
The need for speed impacts both backup-and-recovery operations in the data center. The need for rapid backup is reflected in an increasingly demanding timetable in global, Internet-connected IT environments. There was a time that IT administrators could schedule backups easily during a “third shift” outside of local business hours. This is the “backup window,” during which time the system would be unavailable to users and customers.
However, the available time to run backup operations has not grown; on the contrary, it has often been reduced to the point of extinction. One of the often-touted advantages of introducing D2D into a backup-and-recovery infrastructure is the ability to reduce the backup window time requirement.
“Enterprises are experiencing exponential growth in data and are finding it more and more difficult to meet backup windows,” says Roger Archibald, vice president of marketing at Copan Systems. “Even more important are restores. When the time comes for a restore, how does the organization know that it has a good copy of data that can be restored quickly? Because at that point, it’s too late to discover that you have a bad backup.”
The need for speed in backup and recovery becomes even clearer when a data center puts an emphasis on recovery. In a world where natural or man-made disasters strike with little (if any) notice, time to data can affect a business’ survivability. Companies that go too long without recovering key data can quickly experience lost revenues and disgruntled customers. High-speed recoverability is an obvious advantage of integrating D2D into the backup infrastructure.
Bringing D2D aboard
Despite the number of turnkey or “plug-and-play” D2D solutions available, the integration of D2D backup/recovery into a data center requires planning and an understanding of the real IT requirements and potential financial benefits of D2D.
Bringing D2D aboard requires a vision of the data growth that a business is likely to experience. This is the foundation of capacity planning and the way to match the scalability of a D2D product to future capacity requirements.
Some D2D solutions are marketed as appliances, and storage managers need to know what the maximum capacity of the appliance is and whether the appliance is modular enough to scale easily as date volumes grow.
Implementing D2D backup/recovery also demands a careful look at an enterprise’s current backup policies and practices. Adding D2D combats the problem of shrinking backup windows, and this alone might justify the investment, but adding D2D leads to a few questions:
- Will the D2D solution disrupt or streamline backup/restore operations?
- How will the D2D implementation contribute to overall uptime?
- What impact will it have on human assets?
A subset of these issues is the question of how a D2D solution integrates with the existing infrastructure. This is basically a question of ease of use and ease of installation. Addressing this issue demands that the user make “due diligence” a priority. A certain amount of added complexity is unavoidable, since D2D exists as a new stop along the data path. Fortunately, D2D solutions can handle multiple streams, automating much of the data handling.
In the area of D2D backup and recovery, many of the tried-and-true principles relating to all kinds of IT investments apply.
Justification of any backup-and-recovery solution springs from a basic axiom: Data-center management cannot guarantee the safety of enterprise data if it exists only in one place. Maintaining backup copies opens up the data center to the risk of outages, and the cost of these outages can be calculated using the equation:
(Frequency of outage) × (Duration) × (Hourly cost) = Lost profits
Working out the variables requires calculating downtime factor elements. This includes lost revenues, reduction in worker productivity, lost business, and the impact of the outage on the company’s market position.
Against these figures, the D2D investment is measured. The elements to calculate include
- Operating costs over the D2D product’s useful life;
- The impact on other investments, such as tape subsystems;
- Operating expenses such as floor space in the data center and power consumption;
- Human capital issues…the rise or drop in headcount, training costs, etc.; and
- The general economic impact of D2D on operations (such as the value of time saved in the backup process).
In some cases, the return on investment can be substantial. According to Diamond Lauffin, senior executive vice president at Nexsan Technologies: “Some users are completely eliminating their single-point-of-failure, linear-based [tape] systems with fully redundant, self-healing D2D systems, recognizing an immediate ROI of 15% to 25% savings…and in some cases much more.”
For medium and large businesses, a terabyte is no longer an imposing figure. But backing up and recovering that much data and more is an imposing task. The many approaches to D2D backup and recovery are gaining traction across a spectrum of industries, especially where faster backup-and-recovery speeds are important. Backup-and-recovery methodologies are evolving to a point where hard disks are used for short-term backup and recovery as well as a buffer between primary storage and a tape archive. Disk-to-disk backup and recovery is a viable solution that spreads across a variety of budgets and IT requirements.
Mark Ferelli is a journalist who specializes in storage. He can be reached at email@example.com.
D2D and remote offices
Remote offices pose a significant management challenge-a challenge that cannot be neglected considering that as much as one-third of a corporation’s critical data can reside in remote locations.
There are two problems with trying to manage data protection locally: personnel, and the volume of data to go over a WAN.
It is difficult to cost-justify maintaining professional IT staff at each remote installation. IT personnel are typically headquartered at an enterprise’s data center, but much of the valuable data (and the problem of data protection) is decentralized.
Even when you are backing up disk to disk, most file sets are too large to permit whole files to run across low-bandwidth WAN connections. Even if a user changes only a small part of a file, the entire file is considered new by a backup application and is backed up. For this reason, even running a baseline backup followed only by incrementals would consume excessive bandwidth when there are production runs to transmit. And there is no place where the need for speed is more present than where WAN bandwidth and transmission times are limited.
Files at remote offices are frequently backed up, not only as applications files, but also as attachments to e-mails or resident in personal folders that are protected. This explains the sometimes 8:1 ratio between backup capacities versus original file capacity requirements. Historically, most solutions to remote office backup/recovery have adversely impacted bandwidth and productivity.
The lack of trained IT personnel and the ineffectiveness of whole file backups over a WAN are serious enough for reasons of corporate governance, but it is even more cause for concern in a data-protection environment driven by internal and external data retention and recovery regulations.
SEC, HIPAA, Sarbanes-Oxley, and other regulations have inspired regulated companies to train employees, establish processes, and implement data-protection solutions. But the solutions and personnel are typically in the centralized data center. The processes and policies are put in place by experts at headquarters, but implemented remotely by staff whose responsibilities are often not focused on data protection. In some cases, the procedures might be poorly followed, if followed at all. And these staffers, unsure which records are covered under the regulations, back up everything indiscriminately and redundantly. Therefore, while considerable inroads might be made in ensuring compliance in the data center, remote offices are the weak links in the compliance chain.
In remote offices it is common for non-IT persons to perform backup tasks for which they are untrained. They are trusted to change tapes, handling them as important repositories, but recent events have shown that poor tape handling has led to everything from lost data to identity theft.
Tape-based data protection is not designed to curb the massive redundancy of data files transmitted over the WAN. Backups and restores are routinely “whole file” transactions and accumulate with each incremental backup event.
Disk-to-disk backup/recovery helps with the problem of recovery speeds, but is not always equipped to deal with the data duplication caused by “whole-file” backups.
The crippling, time-consuming redundancy can be addressed through technologies such as commonality factoring and content-addressable storage (CAS).
Avamar’s Axion product, for example, uses commonality factoring and eliminates redundancy across systems, effectively increasing bandwidth by up to 300 times, according to Ed Walsh, Avamar’s CEO-a significant benefit when it comes to protecting data in remote offices.
As data is sent to the Axion commonality-factoring engine, it is analyzed at a byte level. The byte streams are compared to other byte streams, seeking duplicates. When duplicates are found, instead of storing that data a pointer is established back to the initial byte stream. In the case of Axion, the pointer is a 20-byte content address based on the contents of the information. And it is the very small pointer that crosses the WAN to the D2D repository.
File server environments, imaging applications, and even databases can be backed up and restored much more quickly, saving staff time and conserving disk resources by eliminating data duplication.
Commonality factoring can be implemented either on the client or on an appliance. A client-based implementation means that less physical data is sent across the network, a fast approach that is especially useful for wide area backups. However, this implementation requires a total replacement of the existing backup application and can claim a large number of CPU cycles. The appliance-based approach leverages the current backup application and appears to that application as an NFS or CIFS mount point, which is backed up using the software’s disk backup option. The down side to this approach is that the whole data set is transferred to the appliance before commonality factoring is done. This makes the appliance approach less useful for wide area backup and focuses the advantages of commonality factoring on the data center.