Strategies for ROBO data protection

There are plenty of options available, including WAN acceleration, remote access, data deduplication, VTLs, disk backup targets, and remote replication technologies

By Russ Fellows, Evaluator Group

Data protection is just as critical for small and mid-sized companies as it is for large enterprises. Traditionally, there are many options available for the Fortune 2000, due to their historically large IT budgets.  Any company with remote or branch offices requires a data protection solution that can support corporate policies at central and distributed sites.  Often, these environments are referred to as remote offices and branch offices, or ROBO. 

Like many natural events, the IT industry is cyclical.  Originally, data was housed only in data centers; as IT technology progressed, information processing and storage moved to a distributed model, fuelled by the emergence of PCs.  However, distributed computing created the need for distributed data protection.  Most recently, today's economic realities have forced companies of all sizes to once again evaluate their IT environments and find ways to cut costs while maintaining service levels.  IT departments now have more options than ever when choosing a data protection strategy for their remote environments.  This includes the emerging "cloud storage" and other service options, along with more traditional approaches

For a variety of reasons, cloud-based backup strategies are usually not the best option for ROBO environments, due to the relatively large amounts of data, corporate data protection requirements, business availability requirements, or other concerns.  Thus, another set of alternatives is needed to accommodate the needs of ROBO environments.

Create a strategy
Before technologies, products or vendors are considered; a ROBO data protection strategy should be created.  This strategy should include input from all principal stakeholders, including the IT organization, CIO and line of business managers responsible for applications at remote sites.  A ROBO data protection strategy does not need to be extensive, costly or painful, but is a critical factor to overall success.  The guidelines presented in this article provide the primary aspects that should be considered when creating a ROBO strategy.

Another factor for success is to use independent consulting organizations that are not directly compensated by selling specific vendors' solutions.  Relying on a vendor to provide recommendations will often result in solutions that utilize those vendors' products to the exclusion of other, potentially more qualified choices.  Additionally, an internal group of IT professionals is also required in order to provide insight to the existing IT environment and to work with the line of business managers to specify the business objectives in terms of IT service levels.

Specifically, the following information is required in order to formulate a ROBO data protection strategy:

• What business applications are required at remote offices?
• Total number of remote sites
• Amount of data required at remote sites
• Availability and expertise of IT staff at remote locations
• Existing infrastructure both in central and remote sites
• Network connectivity to remote offices 
• Budget available

Often a solution for a large enterprise with hundreds of branch offices, each with several hundred people, will not be appropriate for an enterprise with one or two remote offices with 20 to 50 people.   Each has unique requirements, which must be considered prior to creating an optimal solution.  As always, one size does not fit all, and readers should evaluate their needs in order to find the best solution for their particular requirements and environment. 

Data protection technologies
Traditionally, data protection has meant using tape to store data.  Tape has been a popular choice for disaster recovery in particular since tape easily accommodates the ability to move data off-site, which is a requirement for disaster recovery (DR).  While tape remains a good option for protecting large amounts of data off-site, other options have gained popularity recently. 

One of the best ways to protect ROBO data is to avoid storing any data at a remote site to begin with.  If all data is maintained at primary locations that already have data protection, there is no need to add data protection for remote sites.  Many times this option is overlooked, although it is typically the best option for small ROBO sites.  There are two types of technologies designed to provide remote access to data: WAN acceleration and remote access. 

Extending LAN file protocols, such as CIFS and NFS, over a WAN or VPN connection to a central location is one method of delivering data to ROBO locations.  However, CIFS typically delivers very poor performance over WAN connections.  Accessing a file using CIFS requires many commands, each of which incurs a delay due to the latency of the network.  Without WAN acceleration, NAS protocols such as CIFS, NFS and others perform poorly over WAN or DSL connections, regardless of the distance. 

A category of products designed specifically to improve remote NAS performance use "WAN acceleration" technology.  These products can improve remote access performance significantly.  In particular, WAN acceleration products can improve CIFS performance by an order of magnitude, and will improve the performance of other common protocols significantly as well.  By using WAN acceleration, all data can be maintained and protected at a central location, thus eliminating the need for data protection at ROBO locations.  This option works well for small sites with limited on-site IT expertise, or a limited amount of data. 

Another technology option for smaller ROBO environments is to utilize a remote access technology such as Windows Terminal Server (available on Windows Server 2003 and 2008), Citrix AppServer, or other technologies.  With remote access, the data is never actually transferred over the WAN connection; instead, only the graphical interface commands are transferred.  As with WAN acceleration, remote access products allow the data to remain at the central locations, thereby removing the need to protect data at the ROBO location.

For ROBO environments that are not able to access data remotely, more traditional data protection technologies are available.  These include data backup applications, along with a tape or disk-based system for the protected data to reside.  This category of products includes typical backup applications, along with tape drives, tape libraries, virtual tape libraries (VTLs) and disk- based backup systems.  Another technology that works well in conjunction with this type of data protection is the use of data de-duplication. Data de-duplication is often coupled with VTLs and disk backup systems, but may also be a standalone product, or coupled with the backup software.  

The final piece of technology that must be considered is data security.  Security typically requires encryption, but does not always require its use.  Data should be protected during transit (in-flight) at a minimum.  Additionally, best practices, corporate policies and regulatory requirements may dictate that data is encrypted while stored ("at rest"). The topic of security requires a substantial review and thus will not be covered in this article.  For more information on data encryption and data security, visit evaluator.com. 

After understanding the business requirements, the infrastructure and other IT issues, it is then possible to develop a strategy for ROBO data protection.  A successful strategy will be optimized for your environment to meet application requirements, and outline the technologies to use.  However, a strategy should not specify particular vendors or products.  The table below provides an overview of when particular data protection technology approaches are most appropriate.

Evaluating ROBO data protection options  


When to Deploy

Remote access

• Limited IT at ROBO
• Centralized database applications
• Large amounts of data
• Environments with moderate network connectivity (e.g., T1)

WAN  optimization

• Limited IT at ROBO
• Applications require access to file data
• Moderate amounts of data
• Environments with moderate network connectivity

Backup data locally at ROBO• On-site IT at ROBO
• Large amount of data
• Data used at ROBO sites is not used at central location
• Limited network connectivity (slow DSL or less)
Backup data to central site• Limited IT at ROBO
• Moderate amounts of data (typically, less than 10TB can be protected daily)
• Network connectivity sufficient for amount of data protection (T1 = 625 MB/hour)
Hybrid approach• Multiple applications and/or data sizes
• Large workforce at ROBO sites with diverse needs

The most critical requirement is to gain an understanding of the business applications that are required at each ROBO location, along with an evaluation of the existing IT infrastructure at ROBO locations.  This information, combined with an understanding of the benefits of each technology outlined above, provides the pieces to create a remote office data protection strategy. 

A crucial component of data protection is the software, application or appliance used to ensure data protection policies are implemented.  Historically, this role has been delivered using a backup application.  However, when using remote access or WAN acceleration solutions, data protection will typically occur at a corporate or central data center.  For data protection solutions that rely on data protection at ROBO locations, it is important that the policies provide protection for the business applications, while complying with corporate policies and requirements. 

ROBOs without a data center
Remote office environments that do not have a data center or a dedicated IT staff require careful consideration.  In many cases, providing remote access to data or applications using remote access or WAN acceleration products is a good solution.  With these scenarios, protecting data at the ROBO location is not necessary, since all data resides at a remote or central location.  This option is often one of the easiest and lowest cost options available, which explains its popularity. 

Remote network (WAN) bandwidth and distance from the primary site are considerations when using remote access technologies.  High bandwidth is not required, although moderate bandwidth is required.  Another important consideration is the delay or latency of the network used for remote access.  A DSL connection, common with many small offices, may suffice for a few people, but will typically not support more than 10 to 20 users.  A DSL network has higher latency than other WAN connections and limits upload speeds, both of which limits the effectiveness of the remote access protocols.  WAN optimization works well when accessing a relatively small amount of data; files up to a few megabytes may be supported with reasonable speeds.  However, for large data sets, or applications that are sensitive to delays, WAN acceleration will not provide sufficient improvements for this option to work well. 

The other remote access option is to utilize remote access software.  Products such as Microsoft RDP (available with Windows Terminal Server) and Citrix XenApp are two of the most prevalent options available.  Both support hosting Windows applications remotely, with Citrix also supporting Unix and Linux applications.  Access to applications is supported over a LAN, WAN or VPN connection.  By only transmitting the visual interface of the application rather than application data, this option works well for transaction processing applications that are intolerant of data delays (e.g., Oracle, SAP, SQL Server and other database applications). Additionally, there is a reduced risk of data loss, since application data is not transmitted to remote sites or computers, only an image of the data.  Remote access products allow corporations to establish policies that restrict the transfer of data outside of corporate servers, further protecting data.

ROBOs with a data center
For remote offices that require either local data, or those having a local data center with IT staff, a more traditional approach to data protection may work well.  Typically, a solution for these environments will utilize backup software along with a tape-based system, VTL, or disk-based  backup platform.  The challenge with these solutions is meeting both service level requirements and corporate requirements for data protection.  All the corporate requirements for off-site data storage, encrypted data protection and other requirements are also required at the ROBO location.  Tape solutions provide off-site disaster recovery protection, but often do not meet recovery time objectives (RTO) or recovery point objectives (RPO). Other solutions that utilize disk, VTL or appliances are often able to meet RTO and RPO levels, but may require costly options for off-site DR capabilities.

In order to provide DR, off-site storage is required.  Both VTL and disk backup solutions utilize either movement to tape, or replication to a remote site for DR.  Using disk-to-disk-to-tape (D2D2T) with either a VTL or disk target will further complicate the ROBO data center, adding cost and complexity.  Replication will also add cost and complexity to the network environment.  Thus, a solution that delivers adequate RTO and RPO levels, while providing DR capabilities, will require a substantial investment in each ROBO data center, along with training and support by dedicated IT staff. 

For these solutions, it is important to reduce the amount of data transferred by using data deduplication technology.  Both VTLs and disk-based backup targets designed for ROBO deployments typically include replication and data security along with data deduplication.  In practice, there is little difference between a VTL and a disk-based backup target, other than how the backup application interacts with these devices.  VTLs emulate tape, and are often a better option for ROBO environments that are currently using a tape infrastructure.  For new deployments, disk-based backup targets are often a better fit, requiring less administration and providing more flexibility. 

Hybrid ROBO environments
A remote office with a limited data center or applications that demand low latency access may require elements of both solution sets outlined.  In these cases, it may be appropriate to design a solution that incorporates aspects of the other alternatives, on a per-application or workgroup basis.  For example, one group may primarily utilize an SAP application, and another work group may create and edit local CAD drawings.  For this type of ROBO environment, deploying remote access technology for access to the SAP application may be the best option to overcome network delays for this application.  In order to support the local CAD application, backing up data to a disk target, which is then replicated to the primary data center, may be the best option to support this application.  Thus, a single ROBO site may need to utilize multiple products and technologies in order to provide an optimal data protection strategy. 

Final considerations
The steps involved in planning and implementing data protection for your ROBO locations follow the basics of any project.  First, create a solid strategy that encompasses business requirements, technology and budgetary constraints.  Identify vendor and product choices and get input and validation from trusted third-party sources. 

With the advent of data deduplication, remote connectivity, WAN acceleration, VTLs, disk backup targets and remote replication technologies, there have never been more options available.  By following the outline provided, IT staff coupled with advisory services from independent organizations will be able to create ROBO data protection solutions that meet the identified requirements, effectively and affordably.  

Russ Fellows is a managing partner with the Evaluator Group research and consulting firm.

This article was originally published on June 19, 2009