Issues include performance, high availability, interoperability, manageability, and scalability.
By Mike Strickland
The ratification of the iSCSI specification and Microsoft's delivery of iSCSI driver support last year are expected to accelerate implementation of iSCSI-based storage systems. iSCSI promises the benefits of storage area networks (SANs) at less cost and complexity than Fibre Channel SANs.
These developments have led to several iSCSI-based products. End users finally have options for iSCSI implementation and will have to sift through them to determine the best fit for their storage networking requirements.
End users should evaluate a number of features, including performance, high availability, interoperability, manageability, and scalability (see table).
iSCSI in software running over a standard Ethernet NIC can be very cost-effective for an application server that has a limited amount of I/O traffic or spare CPU capacity. However, for servers with more-demanding I/O profiles, an iSCSI host bus adapter (HBA) may be required.
When evaluating performance, one of the first considerations is I/O block size. While larger block sizes of 64KB can be used for backups, many databases use block sizes of 2KB, 4KB, or 8KB. Databases running over a file system, or which use buffered I/O, typically use the file system block size or a multiple (usually 4KB or 8KB). Databases running over a raw partition with raw I/O typically have the option to use smaller block sizes. When a file is opened and tens to hundreds of blocks are read, it's important that the storage
device be able to support a large number of cached read IOPS (I/Os per second). Since I/O traffic is "bursty" in nature, a Gigabit Ethernet port may need to handle a 75,000 IOPS rate to handle temporary line saturation when the profile is two-thirds reads and one-third writes of 2KB block sizes.
In addition to IOPS, another performance factor to consider is latency. Some database operations such as indexed searches require that one I/O be completed before the next one is issued. On the other hand, sequential reads of blocks in a file require less latency because caching on the disks and in the storage device typically provide data as fast as the networking link can consume it. Average I/O response times within a data center of several milliseconds are common as disk access times are significantly larger than other delays. The delay that is contributed by an HBA is typically about 30 microseconds of latency. Even when the I/O response time is as low as one millisecond, the HBA delay is contributing less than 5% to total latency.
Also affecting latency is the GbE switch. Switching an Ethernet packet before it has been completely received, also called "cut-through" routing, can deliver sub-10-microsecond latency. Less-expensive switches that employ a store-and-forward approach may have latencies as high as 40 microseconds.
A final performance area to look at is the TCP/IP offload engine (TOE) or iSCSI HBA, and the maximum round-trip time (RTT) from initiator to target. This is generally not a concern within a data-center environment. But for WAN traffic, the HBA must have enough inbound buffer memory to achieve line speed. With a fast private network, the RTT can be as low as 55 milliseconds over a distance of 5,000km, so 32MB of DRAM buffering is sufficient. For longer distances, additional memory may be required.
High availability and reliability
High-availability (HA) requirements will vary depending on end-user requirements and cost issues. But before looking at the business case for reliability, users need to look at data integrity.
The TCP transport provides reassembled data in the correct order and provides a checksum for the data in each packet. But the architects of the iSCSI specification were still concerned about infrequent scenarios such as a router fragmenting a packet and generating a new valid checksum for a fragmented packet, which has a changed data bit. To offer protection against these rare events, there is an iSCSI Data CRC option that ensures data integrity at the SCSI level. Users may want to enable the iSCSI Data CRC option to protect data. IT managers should also ask for vendor benchmarks that are run with the iSCSI Data CRC option enabled. The performance may vary significantly for NIC and TOE implementations that rely on the host CPU to handle compute-intensive CRC calculations.
For mission-critical environments, users should look at HA configurations that provide dual paths from the initiator to the target. This capability is well-established in Fibre Channel SANs. The major operating systems already support multi-pathing, allowing for load balancing across several paths to storage, and fail-over from one path to another. Multi-pathing works well with the iSCSI option of Error Recovery Level 0. Using the operating system for this capability provides IT managers additional flexibility when evaluating various iSCSI products and architectures.
If you choose to use an iSCSI HBA, additional issues to consider include error-detection and bug-fix capability. Data paths within ASICs should be parity-protected, and data paths to off-ASIC memory should be ECC-protected. Also important is the capability to offer fixes for protocol interoperability bugs. If the TCP state machine and other protocol logic are hardwired, there is limited capability to implement fixes and to respond to changes in emerging standards.
The ratification of the iSCSI specification and interoperability "plugfests" at the University of New Hampshire and by Microsoft provide opportunities for vendors to work out interoperability issues. Most of the interoperability concerns revolve around handling the different options that are negotiated during the iSCSI login phase.
Interoperability will also be improved by Microsoft's "Designed for Windows" logo requirements for iSCSI implementations. Microsoft has established a series of requirements and tests for both initiator HBAs and iSCSI storage devices.
Some have argued that it will take a while to develop management tools for iSCSI SANs, but that shouldn't be the case. The switches used for iSCSI SANs are the same as those used for TCP data communications traffic, which have well-developed management capabilities.
For most storage devices, SCSI commands with vendor-specific opcodes are used to configure and manage those devices. The iSCSI protocol will carry those "pass-thru" commands just as easily as Fibre Channel.
There are other TCP infrastructure considerations worth examining. One example is the choice of a GbE switch with managed or unmanaged ports. The extra expense to buy a switch with managed ports can be well worth it for remote access to the switch for troubleshooting. It may also be beneficial to have a switch with a diagnostic port that can mirror any of the other ports. Another consideration is the integration of the iSCSI end nodes into an existing SNMP-based network and systems management package.
iSCSI HBAs or storage devices with SNMP agents can present valuable link, TCP, or iSCSI management information base (MIB) information.
While a currently available technology or product may be a good fit for short-term needs, it may not be able to meet future demands. In addition to scaling storage capacity, there are other considerations such as high availability, port aggregation, and large numbers of sessions. When purchasing a low-end iSCSI-based storage system, for instance, what is the capability to add more ports to enable greater aggregate throughput or an HA configuration? For servers with a limited number of PCI slots, are there dual-port HBAs that offer adequate performance without an extra PCI bridge? With increases in future traffic, can an initiator HBA handle the increased number of sessions, or can a target HBA handle enough sessions per port?
There is also the potential convergence of IP-based block level (iSCSI) and file-level I/O.
It may be advantageous to consider a converged storage solution that enables disks to be allocated to iSCSI block mode or network-attached storage (NAS) shared file-access mode as needed.
iSCSI products are expected to proliferate rapidly, and potential adopters should review performance, availability, interoperability, manageability, and scalability considerations to ensure that current and future needs are met.
Mike Strickland is director of product management at Silverback Systems (www.silverbacksystems.com) in Campbell, CA.
iSCSI offload approaches
iSCSI in software over a NIC—The iSCSI and TCP protocol processing is done in the host processor. Each Ethernet packet can potentially generate an interrupt, and the lack of iSCSI awareness means that data is copied from the kernel to the user-space application buffers. Any application server with moderate I/O requirements will consume significant CPU cycles for I/O processing, and application performance may suffer from contention for memory and peripheral buses.
iSCSI over TOE—The iSCSI protocol processing is done in the host, but TCP processing is offloaded. There is the potential to reduce some TCP interrupts, but lack of iSCSI awareness means that an extra data copy is still present.
iSCSI HBA with a general purpose processor—The host bus adapter (HBA) moves TCP/IP and iSCSI processing to an embedded processor without mapping upper-layer protocol functions to hardware. This approach delivers CPU offload without the extra data copy, but data movement bottlenecks may limit performance.
iSCSI HBA with storage network processor—HBAs have a processor that is specifically designed to map protocol functions to hardware (such as TCP header and data splitting). Performance and CPU offload metrics are similar to Fibre Channel HBAs.