Part I — Transport protocol considerations for IP SANs

This article, the first in a two-part series, looks at the requirements for transporting Fibre Channel SAN traffic over managed IP networks.

By Sandy Helton

The growth of Fibre Channel storage area networks (SANs) is creating a demand for technologies and products to interconnect SAN islands within metropolitan area networks (MANs), wide area networks (WANs), and global networks. Interconnected SANs enable the deployment of SAN applications that provide solutions for disaster recovery, backup, and resource consolidation. IT organizations have invested billions of dollars to build out their IP networks, and using these networks to carry storage traffic significantly leverages this investment. IP networks are becoming a popular way to extend SANs beyond the confines of the data center.

Managed IP networks are being deployed with service levels that offer enterprises significantly greater capabilities than the public Internet. IT managers look to service providers that offer high-bandwidth, managed IP networks with service level agreements (SLAs). These agreements contractually assure customers that the performance of the network will be delivered. These new networks meet the needs of storage applications, but to fully utilize their potential, alternatives to TCP-based transport mechanisms should be considered.

Part I in this two-part series of articles addresses the requirements for transporting Fibre Channel SAN traffic across managed IP infrastructures, while Part II (in an upcoming issue of InfoStor) will examine the pros and cons of using TCP versus User Datagram Protocol (UDP) for SAN traffic.

The storage networking applications that are run across MANs and WANs are typically mission-critical. The networks deployed for this purpose are either private or managed networks engineered to meet specific performance and reliability specifications. These networks are not subject to the same congestion that occurs on the public Internet. To satisfy the requirements of transporting mission-critical storage traffic across networks that could potentially span hundreds or thousands of kilometers, it is important to understand the characteristics of the protocols that are used across the network.

The Fibre Channel protocol presumes a virtually loss-free network to maximize performance and relies on the upper-layer SCSI protocol to initiate retransmissions that may occur in the event of packet loss. Managed IP networks have performance and reliability specifications similar to those of Fibre Channel networks and are thus well-suited for mission-critical storage applications. However, public IP network conditions in MANs and WANs are subject to significant bit errors and congestion, which contribute to high packet loss. These factors drive decisions about transport technology choices in the various network environments.

The growing volume and strategic importance of enterprise data, combined with the requirements of data availability, fault tolerance, and recovery, have necessitated the build-out of corporate SANs on an enterprise-wide basis. Enterprise SANs are deployed across multiple remote locations. These SAN "islands" represent a growing requirement for high-speed reliable connectivity; storage applications such as remote data replication, mirroring, and backup for disaster recovery require the ability to interconnect these storage resources across MANs and WANs.

Fibre Channel networks can be extended across MANs and WANs using a variety of approaches. Native Fibre Channel over optical fiber is the most common method used today for MAN connections. This approach uses coarse wavelength-division multiplexing (CWDM) or dense wavelength-division multiplexing (DWDM) equipment to transmit the native Fibre Channel protocol across a dark-fiber infrastructure. Fibre Channel data can also be carried across SONET ATM networks, IP-based networks such as the Internet, or hybrid networks that use both IP and SONET infrastructures. The approach used depends primarily on the application requirements of the SANs being connected.

System requirements
The storage networking applications that require SAN extensions impose the most significant constraints on the approach that will meet the business objectives. SANs are primarily used for mission-critical applications and therefore require access to a secure and reliable infrastructure. This is why most SAN extensions today operate across dedicated, well-engineered private networks.

Network latency is the next critical factor that will govern the type of SAN extension deployed.

The latency factor divides applications into one of two categories: synchronous or asynchronous operation. ATM-based SAN extension typically is not used for applications that require synchronous operation due to the relatively low speed and high latency of these solutions.

Applications that require synchronous operation have traditionally deployed dark fiber-based optical SAN extension, but these solutions have been limited to relatively short distances (10km) due to the limitations of the Fibre Channel protocol. (A 10km limitation was imposed by the Fibre Channel standard, but some vendors are able to exceed this by as much as 100km.)

Transporting Fibre Channel over an IP infrastructure via the Fibre Channel over IP (FCIP) standard is a new trend for storage networking. FCIP solutions can provide mission-critical reliability at performance levels greater than that of native Fibre Channel optical solutions. In addition, these new solutions leverage the existing infrastructure already in place for IP networking, allowing enterprises to quickly add significant capabilities to their SAN deployments.

It is important to understand that SAN extensions require managed bandwidth. SLAs are used as a means to guarantee that Quality of Service (QoS) is ensured by a set of metrics. These metrics can be derived in various ways: through QoS mechanisms to ensure appropriate priority for storage traffic in a mixed traffic environment, through a dedicated link provisioned exclusively for the customer, or via a virtual pipe in an under-subscribed, high-bandwidth network. SLA metrics also include reliability and availability specifications.

The new generation of metro Ethernet carriers such as Cogent, Telseon, XO, Yipes, and others increasingly employ end-to-end optical networks over privately owned or leased dark fiber.

Using DWDM and other emerging optical technologies, they are able to offer highly reliable bandwidth at gigabit and higher rates. While the metro Ethernet carriers are well-known for providing managed bandwidth, many of the traditional carriers such as BellSouth, Genuity, Qwest, Sprint, SBC, and Verizon are moving in this direction.

Metro Ethernet SLAs
There are a wide variety of SLAs, but each usually includes specific guarantees for availability, latency, and reliability. The key is to understand each performance metric and how it relates to extending Fibre Channel SANs over IP.

Availability is the guaranteed network uptime, including allocation for scheduled upgrades and repairs for network failures. 99.99% availability is typical, although 99.999% ("five-nines") availability is offered by many carriers at an additional cost.

Latency is the maximum time a packet takes in transit across a network. Latency has two components: distance and network processing (delays in intermediate network elements). Latency usually diminishes storage application performance, but the extent of that impact depends on the application. Latency well under 10ms is typical for MANs, and several managed bandwidth providers specify numbers as low as 0.5ms to 2ms for metro networks providing Ethernet (Layer 2) transport.

Reliability is a measure of packet loss in the network due to congestion and the native bit-error rate (BER) of the physical medium. To accommodate strict SLAs, managed IP networks do not over-subscribe bandwidth (50% utilization is typical), and therefore, congestion is not a significant factor.

Managed IP networks have characteristics very similar to those of dedicated Fibre Channel networks. In fact, with a BER of 10-12, Gigabit Ethernet (GbE) and Fibre Channel have identical formal error-rate specifications, the primary component of packet loss in congestion-free networks.

Sandy Helton is executive vice president and chief technology officer at SAN Valley Systems (www.sanvalley.com) in Campbell, CA.

This article was originally published on April 01, 2002