Using an analyzer for SAN troubleshooting can uncover hidden problems and unlock full performance potential.
By Lawrence Bain and Steve Klotz
The surge in networked storage traffic is making it increasingly difficult to maintain a well-tuned, high-performance storage area network (SAN). Fortunately, new tools are becoming available to monitor and manage storage networks. These tools are helping administrators and integrators resolve SAN problems and bottlenecks so that the full performance benefits of these networks can be realized.
SAN performance monitoring encompasses low-level FC-2, FC-4, and upper-layer protocols that are responsible for most of the performance characteristics of a SAN. This involves all aspects of Fibre Channel, from low-level flow control (such as credits) to attributes of the operating system and file system (such as queue depth and I/O sizes).
There are many potential problems that can occur in SANs that are essentially invisible to users and can result in degraded performance or even data loss. These "hidden behaviors" can occur for a number of reasons. And while SAN networks have high reliability, when network problems do occur, the enormous amounts of data involved can lead to "train wrecks." Network error-recovery mechanisms can often mask the early symptoms of these problems, so finding, analyzing, and correcting problems in a timely manner are essential to avoiding such wrecks.
SANs are growing in size and are also becoming more complex and diverse. Speeds are increasing, new protocols are evolving that will soon result in SANs with heterogeneous transports, port counts are growing, and storage network topologies are becoming more complex.
Fibre Channel networks can be deployed in many different configurations to match customer requirements and are very diverse in their capabilities, from arbitrated loop configurations to long-haul inter-switch links using WDM technologies. As such, SAN administrators and integrators need a new level of monitoring and analysis capability.
A protocol analyzer captures transmitted information from the physical layer of the Fibre Channel network. Being physically located on the network (versus at a software re-assembly layer like most Ethernet analyzers), Fibre Channel analyzers can monitor data from the 8b/10b level all the way to the embedded upper-layer protocols.
Contrary to popular belief, Fibre Channel network devices (HBAs, switches, and storage subsystems) are not capable of monitoring most SAN behavior patterns. Also, management tools that gather data from these devices are not necessarily made aware of problems occurring at the Fibre Channel physical, framing, or SCSI upper layers for a number of reasons.
Fibre Channel devices spend most, if not all, of their time dealing with the distribution and handling of incoming and outgoing data streams. When devices are under maximum loads, which is when problems often occur, the device resources available for error reporting are typically at a minimum and are frequently inadequate for accurate error tracking. Also, Fibre Channel host bus adapters (HBAs) do not provide the ability to "sniff" raw network data, as is possible with many Ethernet network interface cards.
There are a number of common SAN problems that occur in deployed systems as well as pre-production environments. The most common problems, which are visible only with a Fibre Channel analyzer, include credit starvation, undetected physical errors, file-system I/O splitting, and device bursting.
Fibre Channel maintains strict flow control between devices by utilizing credits. Each credit received by a device allows that device to transmit one frame. When a device does not have credits available, it cannot transmit. When this occurs, SAN performance can suffer significantly.
Devices run out of credits for a number of reasons, including fabric congestion and the inability of a device to receive and process frames at Fibre Channel speeds. There are a number of factors that affect this, including:
- The number of available credits from devices, which generally have a fixed number of credits. Many devices have four credits, while some have 64 or more. Devices typically reserve half of their available credit buffers for frame transmission, with the other half given out as credits. This means that the majority of devices on the market only have two credits.
- Capabilities of the devices to quickly process data. A device has to be able to offload incoming data at the same rate that it is receiving it, or it will run out of credits. This can cause a domino effect: If one device runs out of credits, every device attempting to transmit data to it has to wait until credits become available again. Thus the others run out of credits as well.
- Link round-trip delays. This is the delay imposed by the physical length of the SAN. Each 2km of optical cable represents approximately 20 microseconds of round-trip delay. Typically, SAN design rules imply that for each 2km of round-trip delay, two credits are needed to sustain 100MBps at 1Gbps Fibre Channel rates. But this is deceptive because many devices do not transmit 2KB data frames. Instead, they may transmit 1KB or even 512-byte data frames. Also, this calculation doesn't account for the "little" frames required by SCSI, such as status, command, and the transfer ready frame. With each frame consuming a credit, regardless of size, this leads to longer link delays.
- In addition, the round-trip delay is affected not only by cable length, but also by the number of devices participating on a loop. Each device on a link adds approximately a 0.5 microsecond delay. This is equivalent to adding 100m of cable to the link per device. This may not seem like much, but 2KB reads on an arbitrated loop (typical in database applications) can be degraded by more than 18% for every 100m of cable delay.
- Arbitrated-loop architectures require additional round-trip delays to arbitrate, open, and close the loop. Typically, there are three round-trip delays per loop access. If the loop is being poorly utilized or small I/O operations are being requested, the imposed delay can be much larger than the amount of time required to send the data.
- Finally, the amount of degradation due to out-of-credit situations will vary depending upon I/O sizes. Small I/O operations (512 byte to 4KB) are generally affected more than larger ones by out-of-credit situations, because the average frame size is smaller and more credits are necessary to sustain consistent data flow.
There are many factors that contribute to credit starvation. Many of these can be avoided by proper consideration of performance factors in the initial design of the SAN; however, it is often a good idea to analyze the data to test theory against reality.
Undetected physical errors
Undetected physical errors can be caused by a number of factors, including bad cable bends, termination problems, failing lasers, and faulty cables, which may not be seen at the user level.
Take, for example, the 62.5-micron cable for FDDI or ATM that many facilities have installed in their infrastructure. Fibre Channel multimode lasers are designed for 50-micron cable, but will run on 62.5-micron cable as long as the distance is less than around 200m. Longer cable runs will cause intermittent errors, code violations, and other network problems. Because Fibre Channel analyzers view the actual network, they can be a critical component in troubleshooting this type of error.
As with the other "hidden behaviors," physical errors are often undetected by devices and management tools. Fibre Channel is defined in a set of standards that is fully redundant down to the ordered set level. In combination with the recovery methods for SCSI, many framing errors may be automatically recovered and retried at the lowest physical layers and never reported to management tools. In addition, devices can automatically discard and replace many link-level errors, such as code violations, which go unreported on the SAN.
Fibre Channel has many ways in which devices can recover from error situations. Existing SAN management tools traditionally look for link resets and CRC errors as indicators of problems, but these tools do not attempt to look for protocol-level error-recovery mechanisms in the traffic.
Devices are also not capable of seeing errors that they have transmitted due to a faulty GBIC or cable; this leaves it up to the error-receiving device to report them.
When errors do get reported up to the operating system and/or file system, they can result in the SCSI subsystem being "throttled" to allow only one outstanding I/O at a time (per device). Since most enterprise servers rely heavily on overlapping I/Os for storage performance, this can drop throughput to a crawl. For most operating systems, the only recovery for this is to reboot the server.
File-system I/O splitting
When a user reads or writes data to a SAN file system, there are several layers of drivers that "touch" the data and massage it before it gets transmitted on the SAN. These layers are responsible for aligning data so that it can be properly stored or retrieved from storage. The result is that these layers may either sub-divide these I/O operations, thus making smaller requests on the SAN, or aggregate them with other operations and make larger requests. As a result, what the user requests is not necessarily what gets translated into I/O operations on the SAN.
File-system performance is highly affected by the size of the I/O operations. Traditionally, larger requests result in higher throughput, but if they get too large they can result in starvation problems. Most monitoring tools, like Windows NT's performance monitor, can only "see" the request sizes being made by the end-user application to the operating system. This makes it very hard to accurately tune file systems.
Take, for example, a tape-backup operation. Most tape backups perform best at 256KB I/O sizes. However, a single 256KB request could end up being divided into four 64KB requests by lower-level drivers. As a result, it requires additional overhead in the number of frames and accesses that the backup needs to make in order to complete the request. This results in slower performance.
Also, devices vary widely based upon how they are accessed. Sequential I/O operations are almost always significantly faster than random operations. Access patterns and methods are also invisible to users. Most servers today use their excess RAM as a large file cache. What may be requested as sequential reads by an application might end up being randomized over the network to the drives, which occurs when some fragments of the request are cached in server memory while other fragments are not. The operating system will often send out many individual I/Os for the missing fragments, which can severely degrade database or Web server performance.
Analyzers are the only way to view the true characteristics of the I/O requests as they are actually issued to the drives.
Unlike many other network technologies, the Fibre Channel and SCSI protocols are capable of sustained throughput at near theoretical levels. This places a lot of demand on all of the components involved. Servers and host bus interfaces such as PCI can be limiting factors for Fibre Channel performance.
For example, many servers have 33MHz, 64-bit PCI interfaces, which should theoretically be capable of sustaining more than 240MBps. Would this easily handle a 2Gbps Fibre Channel SAN? Maybe not.
Most 2Gbps Fibre Channel SAN configurations are capable of sustaining 400MBps in full-duplex mode. In reality, most SANs don't require this level of throughput for typical application processing. However, this can be a bottleneck in some applications such as backups.
Consider the example of a server running backup operations on a 2Gbps SAN, doing sequential reads from the drive subsystem and sequential writes to a tape subsystem. The server receives data from the drive subsystem in bursts at 200MBps. To maintain a streaming tape backup, it must turn the data around to the tape subsystem very quickly. A 33MHz, 64-bit PCI interface is not capable of transmitting 200MBps and receiving it simultaneously. The result is usually increased command latencies and decreased overall performance. These situations are easy to monitor with Fibre Channel analyzers.
Analyzers are capable of capturing gigabytes of data. They can have 16 or more capture ports, with 1GB of data captured per port. In only a few seconds, an analyzer can capture hundreds of millions of frames and ordered set events. This is an enormous amount of data to sift through when you are only looking for the handful of events that may be causing a problem.
Traditionally, a Fibre Channel analyzer only formats and dumps data to the screen and forces the administrator to be the "analyzer." Today, the data from analyzers requires a trained person to interpret it, which requires in-depth knowledge of Fibre Channel and SCSI.
The solution to this problem is for analyzers to become "smarter." They have to help users identify errors and performance anomalies. Smarter analyzers will reduce the need for expert SAN "firefighters," who spend long hours poring through traces to find problems. Using analyzers is the only way to find these hidden problems in the face of overwhelming amounts of data.
In an ideal world, a SAN administrator would have the ability to monitor huge amounts of low-level packet data and proactively maintain peak, or at least adequate, network performance. This data would come from across the entire network, not just at specific switches or servers.
Analyzing, interpreting, and understanding SAN behavior are the key to maximizing and maintaining SAN performance. Hidden in the massive quantities of data traffic are the early indicators of data loss and corruption. The harbingers of system downtime and performance degradation frequently exist as "needles in the haystack" of bits.
Fortunately, analyzers increasingly assist non-experts in finding the "needles in the haystack." Dedicated run-time performance monitoring tools are emerging that automatically characterize key performance metrics and anomalous behavior on SANs. Analyzers are gaining the capability to automatically collapse vast amounts of trace data into an easy-to-view format with a list of potential bottlenecks and problems. This eliminates the need to examine millions of events and allows SAN administrators to focus on the handful of events that are important. Analyzers also provide graphing and reporting capabilities to give visual clues and summaries of the data. This allows administrators to look at a data trace on a single page versus thousands of pages.
Better SAN performance depends, in part, on the ability to analyze and interpret vast amounts of low-level data and then nip problems before they become more serious. Advanced analyzers that provide administrators with the ability to diagnose problems when they occur can help maintain SAN reliability and performance.
Lawrence Bain is director of engineering at Finisar Corp. (www.finisar.com) in Sunnyvale, CA, and Steve Klotz is a founder and principal engineer at Medusa Labs (www.medusalabs.com), a Finisar company.