SPC benchmark helps IT evaluate storage

The Storage Performance Council's SPC-1 benchmark will provide objective, verifiable performance data.


As all new car buyers learn, kicking the tires is one thing, but the only true way to evaluate a car is to experience its acceleration and handling from behind the wheel. Most IT managers would prefer to "test drive" new storage servers in the same way. Usually, however, they find this practice impractical because of time, cost, and resource restraints. It is too difficult for storage professionals to obtain reliable data about the performance of the products they want to evaluate, under conditions similar to their own operational environments.

As storage becomes increasingly important in enterprise computing and e-business, the need for reliable and comparable data on storage server performance continues to grow. Engineers representing most of the major storage vendors have formed the Storage Performance Council (SPC) in response to this need (see InfoStor, August 2000, p. 1).

The SPC is a non-profit organization founded to define, standardize, and promote storage subsystem benchmarks. Its goals are to disseminate objective, verifiable performance data and serve as a catalyst for performance improvement in storage subsystems. The council has reached its first milestone in developing the SPC-1 benchmark, which is the first of several benchmarks to be offered by the SPC for public use. The draft SPC-1 benchmark specification was circulated for public review this summer. Comments received during the public-review period are currently being studied for possible use within the specification.

The SPC had to resolve a number of issues:

  • Determining whether benchmark measurements should be made at the application level or the storage level;
  • Whether to use extreme workloads or realistic mixes of workload content;
  • If using realistic workloads, what types of workload environment should be reflected;
  • How the workload should respond to the presence of a storage subsystem cache; and
  • Whether the individual components of a workload mix should be allowed to run independently.

The following sections examine each of these questions.

What system level should be the target for measurements?


Probably the most fundamental question facing the SPC was the level at which performance measurements should be taken. The approach adopted previously by the Transaction Processing Council (TPC) was to define measurements at the transaction level, taking advantage of a specified database design. With this approach, the benchmark consists, in part, of real database software. Because of its desire to offer the benchmark in the form of a simple kit available for download over the Internet, the SPC ruled out an approach based upon transaction-level measurements.

The graphs depict two approaches to defining the activity of a queuing system. Each graph shows the results of an example in which two contrasting types of I/O work ("heavy" and "light") are run at the same time.
Click here to enlarge image

At the opposite extreme, an alternative approach was to take direct hardware measurements at the host adapter interface (e.g., using an SCSI tracing tool). This approach was also ruled out, due to the SPC's desire to emphasize benchmark ease-of-use.

The approach adopted by the SPC represents a middle ground in which measurements are taken by software, but the software does not consist of a real database. Instead, the SPC workload is a synthetically generated pattern of requests. Expressed in terms of Unix, the requests produced by the SPC workload generator are issued at the level of the raw, unblocked I/O interface. This ensures no caching of the requests can occur in the file system but allows for re-mapping of request addresses under the control of system software (often referred to as logical volume management). The key results produced by running the benchmark are the I/O response times obtained at a series of identified load levels.

Should the workload resemble a real application?


Given its choice to generate I/O requests synthetically, the SPC could choose any desired pattern of requests, without regard to constraints normally present in real database software. For this reason, a natural possibility might have been to construct the SPC benchmark as a series of "stress tests." For example, three specific tests of a given storage system might assess its throughput capability relative to reads out of cache, reads from the media, and sequential writes. The advantage of this approach is that the results of each test have a very simple interpretation. The drawback, however, is that this method does not produce a clear "bottom line" from the collective tests.

Another approach might be to devise a large number of separate benchmarks, each targeted to a very specific application environment (e.g., bank ATM processing or direct sales over the Web). Despite the ability of this approach to produce a clear "bottom line" for those environments being examined, the shear number of potentially important environments caused the SPC to rule it out.

The SPC's strategy is to develop the appropriate access patterns needed to reflect a small number of broadly defined environments. In this way, a single benchmark workload can be used to get an overall sense of the I/O capability that can realistically be expected under an identified class of production conditions. The intent is that many users, whose workload resembles that of an SPC benchmark, will be able to apply benchmark results directly to planning within their own environments.

What environments should be reflected?


In its initial benchmark, the SPC chose to focus on applications characterized by predominantly random I/O operations that require queries and updates (e.g., mail server and online transaction processing applications). The members of the council felt this segment of applications is by far the largest in the storage system marketplace. Also, many production environments falling into this category have demanding performance requirements and would find the SPC-1 benchmark valuable.

The SPC's intent is to continue adding to its coverage of realistic production environments by introducing additional benchmarks as the need for them is identified by the members of the council. It is anticipated that SPC-2 will focus on applications that require high sequential throughput (e.g., backup/ restore) or concentrate on networked-attached storage (NAS).

How should performance depend upon cache?


Modern storage controls incorporate a wide range of features and capabilities related to the use of cache, including

  • Cache size;
  • Policies for managing modified data, including strategies for redundancy;
  • Sequential detect;
  • Policies for staging, demotion, and de-staging of sequential data;
  • Cache segment size; and
  • Cache stage size.

In a perfect world, the SPC-1 benchmark would incorporate realistic behavior across the entire spectrum of alternative cache implementations. So far, the SPC has accomplished a "down payment" on this objective. Its performance reflects a realistic level of sensitivity to the management of modified data; to the presence of sequential detect algorithms; to the policies for staging, demotion, and de-staging of sequential data; and (through the mix of SPC-1 request sizes) to the cache segment size.

Click here to enlarge image

Open systems have a different relationship with control unit caches than OS/390. Most open systems operating systems have a file cache in host memory that is usually managed by a Least Recently Used (LRU) algorithm. Frequently accessed data comes from system memory rather than control unit cache. One notable exception is that when the device is opened raw or unbuffered, it bypasses the operating system and directly accesses the disk subsystem.

Having data in system memory allows low-latency access at speeds measured in gigabytes per second (orders of magnitude faster than SCSI or other I/O interfaces). One of the more effective ways to increase the performance of disk-intensive applications is to increase system memory and, consequently, file cache hits.

An analysis of many traces showed that the majority of systems did not have re-use of records once they were read. The second and subsequent accesses to a record came from system file cache. The SPC-1 workload specification reflects the resulting lack of record re-use in I/O access patterns.

Trace analysis also showed substantial levels of activity that was sequential in nature (therefore, a good sequential detect algorithm should be expected to produce significant read hits). The SPC-1 specification calls for a variety of distinct access patterns, including reads and writes to randomly selected records, sequential reads, and sequential writes.

Should patterns of access be independent of each other?


Since the SPC-1 benchmark contains a variety of distinct access patterns, the manner in which each access pattern is actually produced becomes important to overall benchmark performance. At one extreme, all I/O requests can be scheduled collectively through a single "master" routine. In contrast, it is also possible to implement the needed requests by using various types of I/O threads, each with a fixed pattern of behavior. For example, a given type of thread might calculate the location at which to request data when issuing its next random I/O; issue the I/O; wait until it completes; go to sleep for a random "think time" with an average length of 40ms; and then wake up and repeat the cycle. The number of such threads would then determine the amount of work of the identified type to be performed by the benchmark.

Readers familiar with queuing theory will recognize the distinction just stated as being a familiar one. The two contrasting approaches correspond, respectively, to the open and closed methods of defining the activity of a queuing system.

The details of how a storage subsystem responds to a bottleneck differ sharply depending upon whether the benchmark implements an open or closed system. The figure presents open and closed versions of an example in which two contrasting types of I/O work are run at the same time, each against its own dedicated disk. The two types of work differ in their requirements for disk service time per I/O ("heavy" work is assumed to take twice as long as "light" work). In the open system, the two types of work are required to run at the same I/O rate. In the closed system, each type of work runs independently but uses the same number of threads. The threads, in turn, are all designed to run, in the absence of competing work, at the same I/O rate. The disk performing "heavy" work becomes a bottleneck in both examples. However, its impact is much more severe in the case of the open system, due to the requirement to keep the I/O rates of the two components of the workload in lock step, regardless of load level or response time.

If differences among the components of an I/O workload are considered to result from a diversity of applications and users running on the system, then it is more realistic to build a benchmark that allows such components to run independently, as in a closed system. The impact that the elongation of a thread's response times may have on its I/O rate, however, makes the behavior of such a system, in some sense, more complex. From this standpoint, a benchmark that behaves as an open queuing system has the advantage of forcing the ratios of various types of I/O requests to be completely predictable.

The SPC-1 specification, as currently drafted, calls for the benchmark to behave as an open queuing system. This choice, however, was one of the issues identified in comments received via public review of the draft specification and is currently being reexamined.

Invitation to participate

This article provides only a brief synopsis of the work going on in the SPC. The council offers individuals, company representatives, or educational institutions the opportunity to participate in the SPC or observe its progress. For more information, visit www.storageperformance.org. Readers can also use this Website to submit comments.

Eric Stouffer is chairman of the Storage Performance Council and manager of the Shark program management team in IBM's Storage Systems Group. The author thanks Steven Johnson (Sun Microsystems) and Bruce McNutt (IBM) for their permission to include in this article some material about the SPC-1 benchmark that will also be presented in a paper to be published in the Transactions of the Computer Measurement Group.

This article was originally published on January 01, 2001