Long-distance storage applications over IP networks must address the reliability of data transfers, application performance, and security.
By Gary Johnson
Many companies are considering extending storage over long distances using a WAN (100 miles or more) to meet business continuity and disaster-recovery needs. Locating recovery centers closer than that (in the same metro area, for example) may be too risky for some companies.
This series of articles examines the key issues involved in extending storage over WANs; describes how they can affect the performance of storage networks; and reviews the technologies that can be used to address these issues.
In the first two articles of this series (see July 2003 and August 2003, pp. 38 and 41, respectively) we looked at the three main issues involved in storage over distance: latency, data integrity, and bandwidth utilization. In the third article (September 2003, p. 42), we described how these issues can be resolved. In this final article we discuss the technologies that can help you develop a remote storage solution that's right for your needs, with a focus on IP-based approaches.
Some of the issues and problems—and potential technology solutions—associated with storage over distance are summarized in the Table.
Before you undertake a project to extend storage over a long distance, you should begin by asking some key questions:
- What replication/mirroring method is best for your situation: synchronous or asynchronous?
- What are the most critical data sets and applications for running your business?
- How much data are you willing to risk losing if there's a disaster?
- How will you design your network to use the least amount of bandwidth?
- What is your Recovery Time Objective (RTO), and can different applications have different RTOs?
- If you plan to use an IP network, how will you test it to ensure reliable data transfer and application performance, and what security measures will you need to consider (e.g., firewalls and encryption)?
IP network considerations
Although IP networks have become ubiquitous for most applications, until recently the exceptions have been business continuity (disk mirroring) and disaster-recovery (backup/restore) applications. These large, block-oriented, synchronous/semi-synchronous applications traditionally had to be tightly coupled directly to the processor owning the data.
Typically, there are three concerns regarding using IP networks for storage applications. The first has to do with reliable data transfer.
IP networks operate in a "send-and-forget" mode. That is, the sending device assumes the data will arrive at its destination, so it moves on to the next task without checking on the success or failure of what was sent previously. Error recovery in this case is usually an e-mail from the recipient asking that the data be re-sent. For storage over WANs, this is unacceptable because the integrity of the data transfer is tightly coupled to the application. The application has pre-set timers that tick away waiting for a response from the storage device. If the timer expires, the application will re-send the data block. If network errors occur in large numbers, the performance of the application will drop dramatically. If errors to a specific storage device are persistent enough, the application may flag the storage device as being "dead" and no longer attempt data transfers to it.
A second concern regarding IP networks is application performance. Especially for synchronous applications, unreliable data transfer will impact performance, as will the number of hops and network congestion. Hops cause latency, and network congestion creates packet loss.
The third concern is security. Typically, a company extends storage over a WAN as part of a business continuity or disaster-recovery application. This means the data moving over the network is the data used to operate the business (or is the result of operating the business). This data is essentially intellectual property. In a private, point-to-point network, it is almost impossible to "hack" into the network and steal or view the data. In an IP network, hacking is a possibility.
Prior to using an IP network for long-distance storage applications, you should test the planned connections on both ends (as well as alternative paths) to determine the number of hops, latency, packet loss, line jitter, signaling rate, resiliency, etc. Tune these to meet your base network requirements. Some storage router vendors provide this as a service.
Creating reliable data transfer across an IP network is largely the responsibility of the storage router, which may use technologies such as pipelining, store and forward, and CRC checking to ensure reliable data transfers. If the storage router encounters packet loss or corruption in a data packet, it simply uses store-and-forward technology to re-send the data blocks, which relieves the application from using server cycles to perform error recovery.
Network performance is an area where multiple scenarios come into play. First, if you will be using a private IP network, you can establish a permanent virtual circuit that meets the bandwidth requirements of the application. You can even build in a burst rate that will compensate for some unexpected or infrequent bandwidth needs. But be careful: A burst rate is just that—a short period of time when your bandwidth use is allowed to exceed what you are paying for. If you design your network based on the burst-rate value, you may run into application performance problems. Payload matching optimizes the payload utilization of each IP packet, thus increasing network performance. If you are using a public IP network, you will need to establish a virtual private network connection through your Internet service provider that covers your base requirements. Establishing a service level agreement (SLA) with the ISP is also recommended.
To address security issues, many companies use encryption devices between the storage routers. Emerging IP standards will include additional security provisions, although they are not available today.
User case study
Recently, a power management company made the switch to an IP-based business continuity solution.
From the time the utility company began business, it had a robust disaster-recovery system. Its disk mirroring application, along with a storage router, mirrored data from the company's primary site to a remote backup site 450 miles away. While this system made it possible for the firm to recover up-to-the minute data, the time to recover that data resulted in system downtime. The new challenge was that impending business initiatives would soon make that system downtime unacceptable.
To improve its disaster-recovery system, the company wanted to be able to do bidirectional synchronous replication so it could have complete data copies at both its primary and remote locations. It would then have the ability to switch over to either site in the event of planned or unplanned outages, maintaining continuous data availability.
The company reviewed several options, including software-based replication. But it needed a solution that would support all the applications in its mixed Tru-64 and Sun Solaris environment. The company also needed to be able to move large volumes of data reliably. The firm chose a Fibre Channel over IP approach. The data replication solution mirrors data and applications in synchronous or asynchronous bidirectional mode over Fibre Channel-based IP connections.
In addition to achieving its data availability objectives, the company also benefited from improved manageability. While most companies understand the necessity of remote sites for disaster recovery, few realize that these remote sites can be leveraged for day-to-day operations. This solution made it possible for geographically distant data centers to perform as a single entity, improving the protection, mobility, and availability of business-critical data.
Gary Johnson is vice president of solutions at CNT Corp. (www.cnt.com), in Minneapolis, MN. He can be reached at firstname.lastname@example.org.