Host bus adapters with large data buffers and hardware context cache can optimize performance in Fibre Channel SANs.
BY MICHAEL SMITH
With strong growth predicted for Fibre Channel storage area network (SAN) hardware and software over the next few years, interest is growing in host bus adapter (HBA) architectures that provide the necessary server CPU offload and I/O processing to enable high-performance storage connectivity.
Two important aspects of HBA architecture are the data buffer and hardware context cache. The benefits of a large data buffer are particularly evident in long-distance applications, when operating at 2Gbps (or higher) data rates or in systems with a heavily loaded PCI bus.
A hardware context cache is specialized SAN hardware that provides increased performance, improved server off-load, and greater scalability. The benefits of a hardware context cache can be particularly apparent in switched-fabric environments.
Need for data buffering
Applications such as remote backup, restore, campus-wide SANs, and the movement of high-bandwidth data (e.g., images, video, and large files) may require the movement of data over extended distances. Long-distance applications are becoming more common with the growth of SAN deployment and the use of repeaters and routers. While improving the utility of the Fibre Channel network, long-distance connections create challenges for high-performance data movement that an HBA with data buffering can address. A key enabler to improved performance of long-distance connections is the amount of buffer credit an HBA supports.
The move to higher data rates also drives the need for increased data buffering on Fibre Channel HBAs. Higher data rates such as the emerging 2Gbps place a greater burden on data buffering to maintain performance, even at connections of moderate distances.
Systems with heavily loaded PCI buses also benefit from HBAs with large data buffers. If congestion on the PCI bus occurs, the data buffer can serve as a data reservoir, allowing the Fibre Channel fabric to continue delivering data to the HBA and maintain link performance. Without a large data buffer, a heavily loaded PCI bus could force the Fibre Channel link to stop delivering data until the congestion clears. This ability to maintain SAN fabric data flow, even in the case of a heavily loaded PCI bus, enables PCI scalability, allowing increased I/O performance through the addition of HBAs to the system.
Fibre Channel data is organized in frames that can be concatenated into sequences to create large block transfers. The frame size depends on the HBA and the target and can be in the range of 512 bytes to 2KB. Multiple sequences can be combined into a single exchange, permitting up to 128MB of data to be transferred with a single I/O command. The maximum amount of frame data that can be in flight at any given time is governed by buffer credits that, if insufficient for the link distance and speed, can severely limit performance.
Figure 1: Chart compares the performance of an HBA with a large data buffer to an HBA with a relatively small data buffer.
Buffer credit is a mechanism defined by the Fibre Channel standard that establishes the maximum amount of data that can be sent at any one time. This scheme allows Fibre Channel to be self-throttling, thereby allowing it to establish a reliable connection without the need to accommodate dropped frames due to congestion. The use of buffer credit ensures guaranteed delivery using the hardware mechanism provided by the Fibre Channel protocol, rather than software error correction. Buffer credit delivers significantly higher performance and lower server CPU overhead compared to other networking technologies.
Buffer credit limits between each device and the fabric are communicated at the time of fabric login. One buffer credit allows a device to send one frame of data (typically 1KB or 2KB) before a "receiver-ready" acknowledgement is received. To send more data than a single frame, more buffer memory must be available. The size of the data buffer dictates the amount of buffer credit that can be extended by the device and, in turn, limits the amount of data that can be in flight at any given time.
Calculating buffer credit requirements
The amount of buffer credit required to sustain maximum bandwidth over extended distances can be easily calculated. For example, light traveling on an optical fiber has a latency of 5ns per meter, so the round-trip latency of data traveling through a 10km fiber-optic cable and a "receiver-ready" acknowledgement is 100 milliseconds. When transmitted through this cable, the transfer of a 2KB frame has a transit time of 20 milliseconds and will therefore use only 20% of the total bandwidth available. Multiple frames must be in flight to more fully utilize the link bandwidth, and five 2KB credits are the minimum required to fill a 10km pipe. Similarly, a 100km cable requires a minimum of 50 buffer credits to maintain maximum link bandwidth.
Figure 2: Interleaved frames from multiple outstanding I/Os can be returned to the host server in an arbitrary order.
When operating at 2Gbps instead of 1Gbps, twice as many buffer credits are needed to achieve the same link utilization for a given distance since the data transit time is cut in half. The proposed 10Gbps data rate will depend even more heavily on large data buffers to sustain high bandwidth utilization.
Fibre Channel architectures
Figure 1 compares an HBA with a large data buffer to one with a small buffer. The HBA with a large data buffer maintains high throughput over extended distances, while the other's performance is dramatically impacted due to its limited number of frame buffers. In a 10km test, the HBA with a large data buffer provides more than twice the performance of the HBA without a large buffer.
Hardware context cache
When an HBA initiates an I/O command, it stores information that uniquely identifies the command, its attributes, and the requested storage device. This information, referred to as "context information," must be correctly matched to the I/O command for it to be properly completed. Context information allows the host adapter to keep track of the many open I/O commands and link them to the data that is sent to, or returned from, the storage device-even when the I/Os are interleaved.
Whenever data from an I/O command is returned to the host server, the HBA must have the context information readily available to complete the I/O's processing. As data from various interleaved I/O commands are returned to the HBA, the context information held by the HBA must match that required by the incoming I/O. Since thousands of I/Os can be executed concurrently, context switching allows the host adapter to multiplex I/O processing by replacing one set of context information with another to complete the processing of each I/O.
Figure 3: Chart compares the performance of an HBA with hardware context switching to an HBA that relies on host-based context switching.
In a simple SAN configuration like a Fibre Channel arbitrated loop, frames for an entire I/O operation are typically delivered sequentially to the server, which allows HBAs that support a very small number of internal contexts to perform reasonably well. Context switching dramatically increases in a switched fabric, where frames from multiple switch ports are aggregated. Interleaved frames from multiple outstanding I/Os can be returned to the host server in any arbitrary order to maintain acceptable performance, as shown in Figure 2.
The inability to switch context quickly in response to interleaved I/Os can have a negative impact on HBA performance and, therefore, on server I/O processing. As switched fabrics have grown in popularity, so too has the need for specialized hardware that enables high-performance context switching in the HBA.
Context information can be stored in either the server's system memory or in memory on the HBA. Some HBA architectures store context information in the server memory, forcing server CPU interruption while processing the context information. Although using the server's memory reduces the cost of the HBA, it can steal bus bandwidth and CPU cycles from the server by requiring the context information to be downloaded to the HBA each time an I/O is returned from a storage device. This process consumes server bus bandwidth as the context information is moved from the system to the HBA. Once the context information is available to the HBA, the I/O processing can be completed. This process involves additional latency, which slows I/O processing and Fibre Channel throughput.
HBAs that have a context cache can store context information locally on the HBA without involving the server. This tight coupling of the context information and the Fibre Channel data enables the HBA to maintain maximum I/O perfor mance and Fibre Channel bandwidth.
While HBAs with hardware context switching can provide significant performance advantages, host-based context switching can dramatically limit performance in switched fabric SANs. Figure 3 compares an HBA with hardware context switching to one with host-based switching. The one with hardware context switching maintains high throughput under heavy fabric I/O load, while the performance of the one without is dramatically impacted due to its host-based context switching architecture. In a switched- fabric environment, the HBA with hardware context switching provides more than twice the performance. In addition, the architecture of the HBA with hardware context switching is more scalable and provides greater server CPU off-load.
HBAs that feature a hardware context cache and a large data buffer can provide performance advantages in Fibre Channel SANs. In addition to enabling higher bandwidth performance, a hardware context cache minimizes server CPU utilization and maintains high performance, even in complex SANs. HBAs with a large data buffer provide the ability for performance to scale linearly with increases in link data rate and the ability to add more host connections per system to the SAN.
Michael Smith is executive vice president of worldwide marketing at Emulex Corp. (www.emulex.com) in Costa Mesa, CA.