TCP/IP offload engines alleviate bottlenecks

TOEs offload TCP/IP processing from hosts to network adapters. Storage applications to benefit include NAS and iSCSI.

By Peter Aylaian

Despite Ethernet's increase in speed to 1Gbps—and soon, to 10Gbps—high-performance servers cannot support gigabit connections and maintain fast response times for applications. Although today's gigahertz processor servers would appear capable of handling these demands, new networks are using only a fraction of their bandwidth capacity while applications continue to respond slowly. This problem will increase in severity as 10Gbps networks are deployed.

The source of the problem may come as a surprise, but emerging "network acceleration cards" will alleviate bottlenecks while lowering infrastructure costs.

Performance bottlenecks

Generally, 1MHz of CPU speed is needed to transfer 1Mbps of data. Because of this requirement, TCP/IP stack processing often decimates system performance, and higher bandwidth aggravates the problem. 100Mbps (Fast Ethernet) of TCP/IP traffic processing requires approximately 100MHz of a CPU's available capacity. Today's GHz CPUs have the capacity to handle this level of traffic without any difficulty.

However, moving TCP/IP traffic at Gigabit Ethernet rates demands approximately 1GHz of CPU capacity, or 2GHz at full duplex. Under these conditions, CPU speed alone cannot address the problem because the I/O capacity reaches its limits.

Click here to enlarge image

With so much CPU capacity devoted to TCP/IP processing, there is little room to handle applications. This problem increases when a server hosts multiple clients, each of which must process one or more TCP connections.

TCP/IP processing can consume most of a CPU's processing power with standard Ethernet network interface cards (NICs). At small packet sizes, the CPU utilization can exceed 80%, significantly reducing the amount of CPU cycles available for processing application-specific requirements.

TCP/IP "choke points"

There are several places where TCP/IP communications clog a server's operations. These "choke points" include

  • CPU interrupts for TCP/IP packet transmission;
  • The number of times a packet must cross the memory bus; and
  • The number of times a packet must cross the PCI bus.

All of these choke points are exacerbated by Gigabit Ethernet performance, which delivers packets every 800 nanoseconds. A closer examination of these choke points shows how this problem compounds.

CPU interrupts—CPUs process information in serial and must be interrupted to handle a new request or process. TCP/IP processing through conventional Gigabit Ethernet NICs provides the opportunity for large numbers of interrupts with every transaction. This inefficient process can equate to thousands of interrupts per second.

Memory—With NICs as their network connection point, systems use local memory to hold connection context information and to store data before final packets are delivered. These multiple accesses to memory can cause significant bottlenecks.

PCI bus—The PCI bus connects the NIC to the system and, in turn, is connected to the memory bus. Every TCP/IP packet that comes through the NIC must cross the PCI bus multiple times. That trip multiplier adds latency, slowing down the system and reducing the performance of other PCI bus peripherals.

Together, these choke points prevent users who have deployed Gigabit Ethernet to reap its performance benefits. Vendors and administrators typically use the following methods to work around the choke points, but each is insufficient:

Replace CPUs with faster processors—A more powerful CPU processes packets faster, improving application performance. However, this method is incomplete because trips to memory and the PCI bus continue to slow performance. The result is marginal performance improvement at a fairly high price.

Balance IP load across multiple identical servers—Using hardware to solve a software problem is the "brute force" method to increase server capacity and application availability. By spreading TCP/IP requests across many new servers, capacity increases, but so do the tasks of managing, synchronizing, and maintaining the systems. This approach requires additional capital expenditures and may require hiring extra support staff, minimizing ROI with each new server added. In addition, many applications (such as databases) cannot load balance across multiple servers without synchronization of messaging, significantly reducing application performance.

This approach also involves added expenses, including multiple software licenses (and/or expensive software licenses for larger servers), increased management time and resources, and increased networking to the new servers (e.g., cables, switch ports, and fabric setup).

Use multi-processor architectures with dedicated CPUs for TCP/IP processing—Although this solution addresses the CPU choke point, it does not solve problems with memory requirements or PCI bus overhead. This method is also expensive because it requires multi-processor servers.

Without any complete solution, the industry is turning to TCP/IP offload engines (TOEs)—a new way of addressing TCP/IP processing without sacrificing application availability.

GbE adapters with TOEs

A TOE offloads TCP/IP protocol processing from the host to the adapter, clearing choke points without incurring high costs. Some TOEs fully offload all TCP/IP processing, while others use a "partial offload" approach.

A TOE ASIC can eliminate the performance penalties described above:

CPU interrupts—By passing complete data blocks to the operating system, TOEs ensure a maximum of one CPU interrupt per transaction. A standard NIC, running 64KB block sizes, will generate as many as 40 interrupts. A TOE, running the same block size, will generate as few as two interrupts.

Memory—Since all context information, TCP segmentation, and re-assembly are managed by the TOE, memory accesses drop dramatically. Instead of multiple trips, there is only one: from adapter to destination.

PCI bus—As trips to memory decrease, trips across the PCI bus decrease. Instead of multiple trips per packet, there is only one: Packets are written directly to the destination.

TOE benefits

Adapters with embedded TOE ASICs can offer a number of benefits:

  • Reduced costs of server purchases and upgrades via the use of smaller servers and less expensive application licenses;
  • Significantly increased CPU cycles for application processing; and
  • Significant drops in latency, reducing client response times and making service level agreements (SLAs) easier to achieve.

Overall, systems using adapters with TOE ASICs can increase throughput significantly across all payload sizes. Tests conducted by PC Magazine on Gigabit Ethernet adapter performance at the server level have shown only marginal gains over Fast Ethernet. The typical non-TOE Gigabit Ethernet adapter performance for a Windows 2000 server was approximately 650Mbps. The performance for a Gigabit Ethernet adapter with an embedded TOE ASIC can reach 2,000Mbps.

Applications that benefit most

Applications that benefit from adapters with embedded TOEs range from network-attached storage (NAS) and iSCSI storage area networks (SANs) to video servers, general-purpose servers, and appliance servers (e.g., Web caching, security, and e-mail servers). These cards can also be used to accelerate LAN-based backup.

One of the key challenges for NAS is to increase IP I/O without increasing the number of filers. Adding filers increases management tasks and requires the addition of disruptive mount points on all attached servers. One way to solve this challenge is to upgrade Gigabit Ethernet NICs to TOE adapters. This increases TCP/IP performance and I/Os per second so that the NAS filer can achieve gigabit throughput levels. In addition, integrating TOEs into servers attached to NAS boxes can significantly improve application performance, as many NAS-attached server CPU cycles are usually dedicated to intensive TCP traffic between the NAS box and server.

TOEs can enable emerging IP storage protocols such as iSCSI to work as efficiently as current block-storage technologies. And block-storage systems that use familiar Ethernet and TCP/IP technology can be less expensive and easier to manage than current SAN alternatives.

A sampling of other applications that can benefit significantly from TOE-embedded adapters include large shared-file applications such as CAD/CAM and video editing; streaming data applications, including video conferencing; and databases, eCommerce, and OLTP. TOEs provide significant gains in the number of transactions per second without increasing the processing power required.

Today's network performance choke points are primarily a result of TCP/IP protocol processing. Adapters with embedded TOEs can eliminate these choke points and fully exploit Gigabit Ethernet networks. TOE benefits include performance gains, reduced server and software license costs, and lower infrastructure costs.

Peter Aylaian is director of marketing at Adaptec's Storage Network Group (www.adaptec.com) in Milpitas, CA.

This article was originally published on January 01, 2003