Evaluating ROBO data protection

Several product and service options exist for ROBO-based data protection. After you figure out your key concerns, the choice becomes easier.

By Eric Burgener

September 12, 2008 -- Although remote office and branch office (ROBO) environments host a large percentage of a company’s critical data, they usually receive short shrift in terms of data protection. A company may have an excellent data-protection infrastructure in place for its data centers, but ROBO locations generally have very limited IT expertise, and many companies just don’t focus on solving data-protection problems in those locations.

For most distributed companies, a surprisingly large percentage of critical data resides in remote locations or on distributed resources, such as desktops and laptops, which are rarely -- if ever -- backed up. Storage administrators in centralized locations may not even be aware of the poor state of data protection in these remote locations or may not know how to address this issue cost-effectively. In the event of outages, organizations with unprotected critical data run the risk of application downtime, lost data, security breaches, or failure to comply with regulations at these remote sites.

There are a number of reasons why many organizations have not been able to implement effective data-protection plans for ROBO environments. This article addresses those reasons, identifies the requirements for effective ROBO data protection, and reviews the classes of solutions available today to provide it.

ROBO data: The poor stepchild?
ROBO data presents a different set of data-protection challenges than centralized data and has historically suffered much more limited coverage in terms of backup for several reasons. First, IT resources tend to be limited in ROBO locations, and sophisticated data-protection expertise may be in short supply or non-existent. Second, while companies may spend sufficiently on data protection for centralized locations, they typically allocate much less to ROBOs. A data-protection "vacuum" often results that limits many ROBO environments to "do it yourself" approaches that are rudimentary or even dangerous. The backup "solutions" that many ROBO locations have implemented themselves often do a poor job of meeting data-protection needs, a reality that is driven as much by their lack of data-protection sophistication as by their limited budgets. And third, in organizations with multiple ROBO locations, there are often inconsistent policies, schedules, and technologies across different regions or divisions, further compounding the ROBO data-protection issue.

Disaster recovery (DR) presents an even greater set of problems. The only DR strategy available to most ROBO locations, if they have one at all, is to regularly ship tapes to a remote site, typically managed by an outside vendor such as an Iron Mountain. This type of physical transport is becoming increasingly risky, as numerous recent news stories indicate how hundreds of thousands of customers' proprietary data is put at risk due to the loss of unencrypted tapes. Certain industries, such as financial services and health care, may have regulations that require DR strategies, but for ROBOs that have cobbled together their own backup solution, it is unlikely they have a very good DR strategy in place.

Data-protection requirements

Due to the general lack of local IT resources and minimal budget, ROBOs tend to evaluate their data-protection options in terms of ease of use, low cost, and functionality -- in that order. Baseline requirements include an ability to manage data-protection tasks with minimal disruption to the business, including both backup and restores of everything from individual files to entire servers, across heterogeneous platforms. The best backups are those that occur without end users even knowing they are occurring, so the ability to support user-transparent backup scheduling, online backups, and backups of open files is important. Solutions that allow users to easily perform their own file-level restores for their own data help to reduce the load on IT administrative resources as well. Recovery granularity is often limited by the daily backup schedule, but any improvement in granularity due to the use of more innovative technologies are most welcome; improved recovery granularity, however, often comes with additional expense and complexity. Remote management allows more sophisticated data-protection resources at centralized locations to handle much of the backup administration, leading in most cases to more comprehensive coverage. Finally, deployment should be relatively non-disruptive and should not require a re-design of the existing infrastructure.

While the first order of business is handling local backup-and-restore requirements, organizations that are farther along in extending data protection to ROBOs will also want to consider DR and archiving. Centralizing data-protection planning and administration is the right strategy if the underlying data-protection solution supports it. If ROBO backups can be easily consolidated to centralized locations, then there is a good chance that this data can be included in the DR umbrella already established by a company for its data centers. The same holds true for archiving. In both these cases, it is generally more reliable to implement DR and archiving strategies around centralized data centers as opposed to trying to handle them at each remote location.

Data-protection options
If you haven't looked at some of the newer data-protection options specifically targeted at ROBO environments, you may be surprised. New technologies such as storage capacity optimization (SCO), WAN acceleration, and continuous data protection (CDP) are being used, along with more familiar technologies such as replication, to provide solutions that are well-suited to meet the data-protection requirements of ROBOs. In general, there are two classes of solutions: product and service.

Product options
Backup software: These products deploy backup agents that reside on the servers to be backed up, and perform file-based backup directly to a local (LAN-attached) backup server. Local backup servers can then transfer data across the WAN to more central locations for disaster recovery, archiving, or other long-term retention purposes. These products can be lighter-weight versions of enterprise backup software products or backup products that are specifically designed for the mid-market, but have no special features integrated that are specific to ROBO environments. They do, however, provide for local backup and recovery and an ability to comprehensively manage backup data throughout its lifecycle through centralized, multi-tiered management. Representative products in this space include Atempo's Time Navigator, BakBone's NetVault Backup, CA's ArcServe, CommVault's Galaxy, EMC's NetWorker, IBM's TSM, and Symantec's NetBackup.

****** ****** ****** ******

Solving administrative woes
For senior systems administrator Chris Stone at O'Reilly Media, a Sebastopol, CA-based technical publisher and creator of the first commercial Website, administrative woes drove the need to deploy a new data-protection solution. Stone’s existing backup product, purchased many years ago, required a lot of homegrown scripting and was becoming too difficult to maintain. Critical requirements in selecting a new solution to handle the company's ROBO environments were ease of administration (including remote administration), heterogeneous server support, and tape-to-tape duplication. BakBone's NetVault Backup fit the bill. "BakBone had the features we were looking for, and their price point and simple licensing policies were the icing on the cake," says Stone. "Instead of writing and maintaining scripts for each of our ROBO locations, I'm able to centrally configure policies to handle those issues, saving lots of time and leading to a more reliable overall solution."

****** ****** ****** ******

Backup software integrated with source-based SCO:
These products integrate single instancing, data de-duplication, and other data-reduction technologies into the backup client, perform capacity optimization using backup client-based resources, and then send the reduced data stream across the network to the backup target. By integrating closely with the existing backup software, the capacity-optimized data can be tracked within the backup catalog and managed with a familiar set of backup management tools and utilities. These products promise the advantages of storage capacity optimization (SCO) technology in terms of WAN bandwidth and storage capacity savings with minimal disruption. These products are relatively new, however, and may not yet be fully integrated into the catalog of the existing backup software. This lack of backup catalog integration leads to requirements for separate hardware to use as a backup target, but all vendors are addressing this issue. End users should ensure they have enough resources on the backup clients to perform capacity optimization without impacting backup times or creating other performance issues. Products in this area include Symantec's PureDisk, which works with Symantec NetBackup as well as other major enterprise backup software, and EMC NetWorker's new 7.4 Backup Client, which has now been integrated with SCO technology obtained through EMC's acquisition of Avamar.

CDP-based solutions: CDP technology uses a backup source-based software component to send writes as they occur to a disk-based journal, allowing recoverable images from any previous point in time to be generated on-demand and used for recovery. In its "enterprise" form, it is used to meet stringent recovery granularity requirements, but CDP-based products for ROBO and laptop use are available as well.

CDP sends writes across a network in real time as they occur, spreading backup out over the course of the day and requiring very little instantaneous bandwidth. This meshes nicely with ROBO requirements for low-bandwidth data-protection solutions, and it is CDP's low bandwidth usage feature that is of more interest to users than its recovery granularity capabilities. Some CDP products do not retain every write, discarding them from the journal to save space. This "near-CDP" approach can effectively create multiple recovery points per day using very little bandwidth, and as such is a good fit for ROBO environments. Representative products in this space include Atempo's Live Backup, eVault's Protect, FalconStor's CDP Virtual Appliance, FilesX's Xpress Restore (acquired by IBM in April 2008), IBM's Tivoli CDP for Files, SonicWall's CDP Appliance, and Unitrends' Rapid Recovery System. Asigra, BakBone, and ROBObak have CDP options which can be used in conjunction with their file-oriented backup protocols for better recovery granularity or WAN bandwidth utilization on select backup jobs.

****** ****** ****** ******
Centralizing data
Trimble, a Sunnyvale, CA-based provider of advanced positioning solutions such as GPS technology, has more than 2,000 mobile employees, spread throughout 20 countries. Performing data-protection operations such as backup and restore on a regular basis was a challenge because employees were not predictably connected to a network. Trimble chose Atempo's Live Backup because it automatically backs up and centralizes data while employees are on the road and allows users to easily restore their own documents and systems at any time, anywhere -- without burdening IT. Atempo's CDP technology allowed Trimble to run backups transparently in the background, whenever employees were online. "Live Backup’s performance across WAN links is excellent, and with distributed host servers each site can not only support its own population locally, but also when they are on the road or at other sites," says Shawn Wilde, Trimble's CIO. "With Live Backup, I am confident that we have the ability to recover any laptop or desktop with minimum effort."

****** ****** ****** ******

Replication plus snapshot: Like CDP, replication sends writes in real time as they occur to a disk. Replication keeps a "source" disk and a "target" disk in sync, and the target disk may be local (LAN) or remote (across a WAN). Because the 'backup" is in effect spread out over the entire day, replication, like CDP, uses little bandwidth. Unlike CDP, however, replication does not maintain a journal, so recovery is only available from the latest point. Replication by itself is not a data-protection solution because that latest point might be corrupt, so replication is often combined with snapshots that provide multiple recovery points if the latest point in time is corrupt. These solutions can maintain one or more disk-based snapshots at the "backup server" (the replication target), and can back up to tape directly from one of these snapshots, offloading the application servers from dealing with any backup load and removing backup windows as an issue. They can also provide a DR solution, distributing disk-based data to remote locations before backups are dumped to tape. Representative vendors in this space include DoubleTake (with its DoubleTake product, not the CDP-based TimeSpring product), and Iron Mountain Digital (LiveVault).

WAN acceleration appliances: These products provide LAN-like performance to remote offices accessing centralized resources, and these performance and latency advantages translate well to data-protection tasks when backup client data sets are under several hundred gigabytes and there is no requirement to cache data locally. This approach allows companies to keep their existing backup software in place while improving backup-and-restore performance. Some of these vendors use SCO technology in their appliances and are moving in the direction of maintaining local data caches. Local data caches will help them handle larger backup jobs with excellent restore performance for data within the cache. By themselves, however, WAN acceleration appliances do not provide native backup and restore capabilities, so they should be viewed as a way to obtain higher performance out of an existing ROBO data-protection solution. Representative vendors in this space include BlueCoat, Cisco, Juniper Networks, and Riverbed.

Local SCO appliances: Entry-level SCO appliances can be deployed at ROBO locations when there is a requirement to back up and restore locally, and they allow companies to use existing backup software while gaining the benefits of reduced WAN bandwidth and storage capacity costs. Data is capacity-optimized locally by the appliance, offloading this work from the backup clients, and can then be replicated in its capacity-optimized form to one or more remote sites for DR purposes. Representative vendors in this space include Data Domain, FalconStor, and Quantum, both of which offer disk-based NAS backup targets with integrated SCO and replication technology.

Service options
Online backup services: Online backup services offload data-protection management to a service provider organization that provides the backup infrastructure. Service providers typically offer advanced data-protection options, such as compression, SCO, encryption, archiving, and DR to smaller organizations that otherwise could not afford them. Start-up can be relatively quick, since it’s generally only a matter of connecting a client’s systems to the service provider's backup infrastructure, and licensing options are generally simple (customers pay a certain amount per gigabyte of data per month).

Online backup services are available in different flavors. Vendors such as Arsenal Digital (acquired by IBM last year) primarily target small- to medium-sized enterprises, taking the backup headache off customers' hands while offering a secure, easy-to-use backup service that provides sophisticated options. Vendors such as Iron Mountain Digital blend continuous backup offerings (e.g., LiveVault) with legacy approaches to distributed site backup that combine tape shipment and off-site storage with pre-existing backup software already installed in ROBO locations. Other vendors, such as start-up Simply Continuous, offer not only a set of sophisticated backup and DR services, but also can serve as a cost-effective DR hosting site for applications in the event of a disaster. Other representative vendors include CISP, DataAssure, ElectroNerdz, and integraVault Literally thousands of service providers offer backup services, many of them with expertise in vertical markets.

Many of the online backup services are based on technology from vendors such as Asigra and ROBObak, both of which who have products specially designed for ROBO environments. ROBObak's product, for example, combines an agent-less architecture, which promotes simplicity of deployment and maintenance, with a set of advanced options such as local site data caching, compression, encryption, and data de-duplication. Large companies with thousands of servers to back up in distributed locations can buy these products and implement them directly, but smaller companies are more likely to get them as a re-branded version from a local service provider.

****** ****** ****** ******

Backing up regularly
ElectroNerdz, a Lakeland, FLA-based technology solutions provider, had been using a tape-based backup approach to service its customers data-protection requirements. "We used to ask our clients to back up their information on tape, then move it off-site. We found that the rate of getting them to actually do it was abysmal," says Bobby Kuzma, ElectroNerdz' vice president of professional services. By moving to an online backup solution from ROBObak, Kuzma got the centralized control to ensure backups were completed on a regular basis. In addition, he can offer customers advanced data-protection options such as source-based SCO, CDP, and encryption of in-flight files -- all without needing to install software on any clients' systems. "Our restores go as flawlessly as our backups," says Kuzma.

****** ****** ****** ******

Cloud-based backup services: Cloud-based backup offers an interesting twist to online backup services that leverages the Internet as the "storage device." When vendors such as Amazon (S3), EMC (Mozy), and Nirvanix (Storage Delivery Network) began offering Internet-based storage as an alternative to the more conventional approach of purchasing and managing your own storage infrastructure, it was a short leap to cloud-based backup services. Encouraging prospects to invest in their businesses rather than their storage infrastructures, these vendors tout easy start-up, unlimited scalability, low cost, and improved flexibility and control. Data from multiple ROBO locations can be centralized under a data-protection service that offers centralized tracking, billing, and management, achieving capabilities that would be very difficult for a small company to build on its own. Concerns with these types of services include data ownership, privacy, and security, but it is difficult to beat the simplicity of deployment and ongoing management these services offer for distributed locations. Vendors are just emerging now in this space, but note that many industry stalwarts such as EMC, IBM, and Symantec, have products, technologies, and expertise that could allow them to offer cloud-based backup if they so choose.

It is clear a number of options exist for ROBO-based data protection. Determine your strategic requirements up-front before you look at solutions. Are you satisfied with your current enterprise solution, and looking to extend it to include ROBO data? Are you looking to move to a solution that is specifically optimized for ROBO data, with features such as SCO, encryption, and replication built in? Are you just looking to add DR capabilities as inexpensively as possible? Is minimizing bandwidth requirements your key consideration? Once you determine your key strategic concern, it becomes easier to choose from among the many options available.

Eric Burgener is a senior analyst and consultant with the Taneja Group research and consulting firm .

This article was originally published on September 12, 2008