Data protection over WANs: The network

Posted on November 01, 2005

RssImageAltText

There are two alternatives to vaulting via trucks-the Internet and private line networks, each with advantages and disadvantages.

By George Hall

Information protection and archiving are becoming increasingly complex. Couple this with the fact that different constituencies within an organization have immediate needs for access and recovery of this information, and it’s clear that IT is under enormous pressure to meet data-protection windows and to ensure information archival and rapid recovery.

When the bubble burst in late 2000 and 2001, the storage service provider (SSP) market also fell significantly. However, over the past 18 months, the SSP market has started to make a comeback. The main areas where SSPs are successful are in information protection, archiving, and disaster recovery.

This article, the first in a three-part series, describes operational implementation of data protection over WANs and examines the issues associated with various approaches.

Before discussing the specifics of protecting and restoring information over the WAN, it is important to first describe in general terms the existing processes for information protection. These processes are largely based on “sneakernet”-i.e., collecting backup tapes from a customer’s premises and physically moving those boxes of tapes, via truck, to a remote location for safekeeping. In many cases this process is dictated by a business’ disaster-recovery (DR) plan, and in other cases it is driven by regulatory compliance requirements for records retention.

An alternative approach to providing information protection is to move the data over the wire (the WAN) to another facility. The remote facility may be an organization’s remote data center or a third-party SSP. This approach can offer the same level of protection as sneakernet, while adding efficiency and removing risk. However, there are some inherent challenges in protecting data over the WAN.

Many companies already have in place well-documented information protection, DR, and records retention policies (although they are subject to frequent regulatory changes). Existing information protection and archival practices today merge with the company’s DR and retention policies to produce an effective level of protection for business data. However, the methods of collecting archival data for storage range from highly centralized to highly decentralized, regardless of any networks that might be in place for internal communications. This range of network connectivity poses significant challenges for IT when it comes to protecting corporate information. The migration to network-based backup and restore could mean increased service levels for information protection and recovery, as well as improved DR practices and compliance policies.

There are two combinations of network-based information protection systems. One method suggests that if a company is large enough and has the need, the capital, and the people to implement it, then it can use remote data centers for information protection. The second method is to send data to a third-party service provider. Each method has two potential network possibilities: the Internet or private line networks.

Option A: The Internet

The emergence of the Internet as a means of migrating data from a company’s primary location to archival locations has been used with varying degrees of success for a number of years. An Internet-only transport mechanism for transmitting DR and archival data has some inherent advantages and disadvantages.

One disadvantage is that the Internet is inherently unreliable in terms of both deterministic throughput and availability of network connectivity. The Internet is not owned or managed by any one entity. It is a network of networks that sits on top of the Public Switched Telephone Network (PSTN).

The Internet today is like a vast global ocean that has calm spots and storms all over it all the time. These turbulent locations are constantly shifting. With respect to throughput from point “A” to point “B” there is no guarantee that a fixed amount of Internet network capacity can be delivered to any one place all the time. Even if the Internet were deterministic and reliable, the base protocols that it uses (TCP/IP) create enough overhead in each transmission that, in some cases, will make the actual throughput from point “A” to point “B” too low to be useful.

Notwithstanding the above, and setting aside security issues for the moment, the Internet can be used in a comprehensive network-centric DR and archival solution. The Internet can connect anyone, virtually anywhere, at any time, at a reasonable price.

However, IT professionals need to look at a number of criteria when they are investigating a network-centric approach. Disk mirroring and other real-time applications do not work well across the Internet. The key issues for consideration in the use of the Internet are data volumes over time and cost. The Internet is less expensive because most companies already have Internet connectivity. Existing Internet network capacity must be viewed in the context of the daily operational requirements of the Internet within the enterprise. DR and archival applications can consume an enormous amount of network capacity. In other words, the movement of large files across the Internet may cause the enterprise’s LAN and WAN Internet access to slow to a crawl.

To determine the suitability of using the Internet for DR/archival applications, several factors need to be calculated, including the following:

  • Data volumes (initial and ongoing);
  • Timeliness (e.g., how many hours each day it will consume the Internet connection); and
  • Absolute capacity of the network (e.g., what is the most that could be sent?).

When determining the suitability of Internet connectivity for DR/archival operations, assume, at best, median conditions or worst-case scenarios of Internet performance and availability, and answer these questions:

  • Can we afford to miss a day if the Internet is down?
  • Are the daily transmission data volumes so high that if we miss a day we will never catch up?
  • Should we consider getting a separate Internet connection for our disaster-recovery/archival applications?
  • What are the timeliness requirements for backup and restore at the file level or the volume level?

Option B: Private line networks

Private line services are dedicated, relatively high-speed telephone circuits that connect two or more points together in a network. It is not the Internet. It is usually more expensive than an Internet connection; however, there are many advantages to using private line services for disaster recovery/archival applications. Private line networks are

  • Deterministic;
  • More reliable;
  • Higher-capacity;
  • More secure;
  • Protocol-independent; and
  • Point-to-point, point-to-multipoint, or multipoint-to-point.

It might be advantageous to consider building dedicated connections between centralized DR/archival sites (see figure). This enables you to tune the speed of each connection to the DR/archival applications between each site, as well as to support a broad array of application-layer protocols used within each site. Simple TCP/IP connectivity over a private line is one approach, while channel extension (e.g., IBM ESCON, and ESCON/e) over private line connections is another approach.

Click here to enlarge image

There are a number of ways to implement a solution, such as starting with a number of connections terminating at a single point.

In this scenario, the termination point would be the “keeper of all data.” The issue with this approach is what happens to the information at this site should there be a disaster.

Another network implementation strategy might involve single point-to-point connections between sites (see figure on p. 32). The issue here becomes recovery (which will be covered in the third article in this series). Longer term, however, it will become much more cost-effective to establish a WAN infrastructure that, for instance, places disaster recovery/archive “centers” in local regions (which improves recovery operations). This architectural concept scales infinitely and is flexible for disaster recovery/archive services in regions with a number of offices.

Each implementation is viable for either sending data to remote corporate facilities or to a service provider. It may require extra effort and can be costly to get a private line into an SSP; however, if your security requirements mandate a private line it is worth the investment.

One of the major considerations in this context is the need for information security. The rash of incidents related to misplaced backup tapes or backup tapes stolen or fallen off trucks has forced IT professionals to explore alternatives to existing DR/archiving solutions. Although moving all corporate data over the wire may not be practical, (see “Information Classification” at www.ridgellc.com), moving the most critical data, securely, over the wire makes a good deal of sense. If done securely, it can alleviate a number of data-protection challenges. (Part two of this series will address security issues.)

Architecturally, these approaches can become burdensome to manage, expensive, and operationally risky. However, a network-centric DR/archival solution can offer a number of benefits compared to moving tapes via truck.

In this article we have described a couple of scenarios for implementing a range of network-based DR/archival solutions. These solutions can take a number of forms-from very small to very large, from centrally managed to geographically distributed-using combinations of the Internet and private line networks to provide maximum protection and scalable costs. The issues discussed can provide a basis for architecture, development, and costs that can support almost any application-layer disaster recovery/archival products under consideration.

George Hall is a technology partner with Ridge Partners LLC (www.ridgellc.com).


Comment and Contribute
(Maximum characters: 1200). You have
characters left.

InfoStor Article Categories:

SAN - Storage Area Network   Disk Arrays
NAS - Network Attached Storage   Storage Blogs
Storage Management   Archived Issues
Backup and Recovery   Data Storage Archives