By Thomas M. Ruwart
Disk subsystems range from single disk drives to large installations of disk arrays. They can be directly attached to individual computer systems or configured as larger shared-access storage area networks (SANs). It is a significant task to evaluate the performance of these subsystems, especially when considering the performance requirements of any particular installation or application.
Storage subsystems can be designed to meet different performance criteria, such as bandwidth, transactions per second, latency, capacity, connectivity, etc. But the question of how the subsystem performs depends on the total configuration of software and hardware layering and the number of layers an I/O request must traverse to perform an operation.
Figure 1: The storage subsystem hierarchy describes the levels of hardware and software that an I/O request traverses.
As an I/O request traverses more and more software and hardware layers, alignment and request size fragmentation can result in performance anomalies that can degrade overall bandwidth and transaction rates. However, keep in mind that these layers significantly improve functionality and performance when used properly. But layer traversal can negatively impact the observed performance of even the fastest hardware components for a variety of reasons.
Storage subsystem hierarchy
The storage subsystem hierarchy describes the levels of hardware and software that an I/O request must traverse to initiate and manage the movement of data between the application memory space and a storage device. The I/O request is initiated by the application when data movement is required either explicitly (in the case of file operations) or implicitly (e.g., in the case of memory-mapped files).
The I/O request is initially processed by several layers of system software, such as the file system manager, logical device drivers, and low-level hardware device drivers. During this processing, the application I/O request may be split into several inter-related "physical" I/O requests that are subsequently sent out to the appropriate storage devices to satisfy these requests. These physical I/O requests must pass through the physical connection layer, which connects the host bus adapter and the storage device.
Figure 2: In a wide-striped logical volume, data is laid out on the disk in "stripe units."
After arriving at the storage device, the I/O requests may be further processed and split into several more I/O requests to the actual storage "units," such as disk drives. Each storage unit processes its requests, and data is eventually transferred between the storage unit and the application memory space (see Figure 1).
Each layer of software and hardware between the application and the storage device adds overhead and other anomalies that can result in highly irregular performance. Overhead is essentially the time it takes for the I/O request to traverse the specific layer. The source of overhead in each layer is specific to a layer and is not necessarily constant within a layer.
An example is the overhead induced by the physical connection layer. A physical connection consisting of a short cable introduces virtually no overhead because the propagation of a signal at the speed of light over a three-foot distance is not significant. On the other hand, propagation of a similar signal traversing a 20-mile storage area network through multiple switching units introduces noticeable overhead.
An interesting result from the interaction of the components in the storage subsystem hierarchy is analogous to the impedance matching problem in electrical signals on wires. The term "impedance matching" is used as an analogy to what happens when there is a mismatch of operational characteristics between two interacting objects.
In an electrical circuit, an impedance mismatch affects circuit "performance" in terms of its gain or amplitude at particular frequencies. In the storage subsystem hierarchy, an impedance mismatch has more to do with things like I/O request size and alignment mismatches that affect the performance (bandwidth or transaction rate) of the storage subsystem as viewed by the application. The effects of these mismatches can be viewed from several different perspectives, including the application, disk device, and system. Consider the following example.
Figure 3: Graph shows performance anomalies, and is an example of the "impedance mismatch" problem.
The logical volume device drivers provide a mechanism to easily group storage devices into a single "logical" device to increase capacity, performance, or simplify the manageability of large numbers of devices. The logical device driver presents a single device object to the application. The driver is then responsible for taking a single I/O request from the application (or file-system manager) and mapping this request onto the lower level storage devices, which may be either actual storage devices or other logical volumes.
There are many ways to configure a logical volume that consists of multiple underlying storage devices. One common configuration is to stripe across (also known as striping wide) all the storage devices to increase available bandwidth or throughput (operations per second). In a wide-striped logical volume, data is laid out on the disk in "stripe units."
A stripe unit is the amount of sequential data that is transferred to/from a single storage device within the logical volume before moving to the next storage device in the volume. The stripe unit can be any number of bytes from a single 512-byte sector to several megabytes, but is generally a constant within a logical volume (see Figure 2).
Even though logical volumes allow for scalable performance, there are performance anomalies within these volumes that are not obvious. These anomalies manifest themselves as dramatic shifts in performance that are triggered simply by a change in the amount of requested data or from the alignment of the data on the logical volume (as shown in Figure 3). This figure also illustrates the impedance mismatch problem.
It shows the performance of three eight-wide logical volumes with different striping factors. The graph also shows how the overall sequential read bandwidth-as well as the variability in bandwidth-increases as the stripe unit size increases. For example, the performance of a logical volume using a 16KB stripe unit can vary from 18MBps up to 44MBps simply by choosing a different request size (i.e., the number of bytes requested by the application on each read operation).
Figure 4 illustrates another as-pect of the im pedance matching problem due to processor allocation. In this graph, the peak read bandwidth for an eight-wide logical volume is plotted against the peak performance of two groups of eight I/O threads, each running to a single disk. One of the eight-disk I/O thread groups is assigned to a single processor in an eight-processor computer system. The other thread group is distributed across all eight processors, with one thread assigned to each processor.
The distributed case performs significantly better than either the logical volume or the single-processor since more requests are processed per second when the request sizes are smaller. It turns out that a single processor becomes overwhelmed with processing requests with between six to eight of these disks, each running at full speed.
When the request processing is distributed across multiple processors, a higher overall performance rate is observed.
Figure 4: The graph demonstrates another aspect of the "impedance mismatch" problem relating to processor allocation.
Also, because the single-processor case closely follows the eight-wide logical volume case, it can be concluded that the performance limitations of the logical volume are due to a problem with funneling the logical volume request processing through a single processor.
The conclusion: The logical volume performance variations shown in the previous graph are a function of the logical volume software and associated implementation parameters.
The purpose of these figures is to demonstrate that things can go wrong, and how they go wrong. Fortunately, however, other layers in the hierarchy-including the application layer and the file system manager layer-can compensate for many of these issues, alleviating some of these problems.
There are many other factors involved in the I/O path that can significantly affect I/O performance, including zoned-bit recording (on the disk drive itself), caching on the disk drive or array controller (or both), rotational latency, seek time, on-board disk processor overhead, and command queues. Some of these factors can be controlled or tuned to compensate for impedance mismatches, but it is important to know which ones to tune and how to tune them.
There can be a large variation in the I/O performance of a disk subsystem, as demonstrated in the example for logical volumes. This variation is caused by the impedance matching problem, which is primarily the result of an I/O request traversing too many levels in the storage subsystem hierarchy.
At each level, the I/O request may be resized or re-aligned in space and time, and by the time the I/O request gets to the storage subsystem, it appears as many smaller requests distributed across many devices.
Furthermore, what the application sent over as a "parallel" request can be broken up into a series of smaller, serialized requests to the storage subsystem. The result is somewhat erratic performance when a series of large requests are made to subsystems with different request sizes.
These are just some of many examples of the impedance matching problem in storage subsystems. Similar problems can occur in disk drive/array caches with respect to their size and algorithms, multi-host storage area networks, and the ever-changing bandwidths and latencies of subsystem interfaces.
Thomas Ruwart is assistant director of the Laboratory for Computational Science and Engineering at the University of Minnesota. This article was adapted from a larger work that was supported in part by the National Science Foundation and the Department of Energy.