A quick review of storage benchmarks

First you need to know what benchmarks are available, and then you need to know how to interpret the benchmark results.

By The Taneja Group

Purchasing new technology can be a daunting task. Many factors come into play, including product features, price, performance, service, upgradeability, warranty, and existing vendor relationships. Although answers to many of the key decision criteria can be readily obtained, determining what type of performance users can expect in their own environments is not so black and white. This is especially true when the products being considered are from different vendors since each may include unique components and specifications that make meaningful comparisons extremely difficult.

The best way to evaluate performance is to test the product in your own environment. However, this is often not feasible due to limited budgets and resources required to conduct the tests. As a result, many users rely on published benchmark test results to evaluate performance. Benchmark tests allow buyers to more easily compare various platforms and workloads, regardless of vendor.

Some storage benchmarks are created, regulated, and audited by standards organizations such as the Standard Performance Evaluation Corporation (SPEC) and the Storage Performance Council (SPC). Regulated benchmarks typically have well-defined, documented test procedures that can be used to reproduce test results in the field.

Before considering a specific benchmark, users should clearly understand the workload and what the benchmark is designed to measure. Users should also determine if the tested workload resembles the actual applications that will run on their system.

Storage benchmarks

Although a variety of benchmarks are available for server performance testing, relatively few are available specifically for enterprise storage systems. This can be partially attributed to the historical practice of purchasing storage directly from the server vendor, especially in the case of direct-attached storage. However, the growth of NAS and SAN environments has led users to increasingly view storage purchases as a stand-alone decision. Another reason is the sheer complexity and effort required to create standardized benchmarks in a consortium environment that can make consensus difficult to obtain. This is no easy task given the large number of vendors in the storage market along with the many components (e.g., disks, disk arrays, controllers, host bus adapters, protocols, etc.) that can impact benchmark results.


Several storage benchmarks are available from the Storage Performance Council (SPC), a non-profit corporation founded to define, standardize, and promote storage subsystem benchmarks as well as to disseminate objective, verifiable performance data. SPC benchmarks are designed to be vendor- and platform-independent. While anyone can run SPC benchmarks, only results that have been audited by SPC for accuracy and authenticity may be disseminated and published.

SPC membership is open to storage systems manufacturers, integrators, and industry analysts. The organization currently has more than 25 members, including large storage vendors such as Dell, Fujitsu, Hewlett-Packard, Hitachi, IBM, Network Appliance, Sun, and Symantec.

Notably absent from the SPC membership is EMC, which prefers to use its own internally developed test scenarios that is based on EMC’s experience with end-user workloads, according to the company. EMC’s decision may also be influenced by the fact that rivals IBM and Sun spearheaded SPC.

There are currently two SPC benchmarks: SPC Benchmark 1 (SPC-1) and SPC Benchmark 2 (SPC-2).


The SPC-1 benchmark was the first industry-standard storage benchmark and the first standard benchmark for SANs. The benchmark consists of a single workload designed to demonstrate the performance of a storage subsystem while performing the typical functions of business-critical applications. These applications are characterized by mostly random I/O operations and require both queries and update operations. Examples include online transaction processing (OLTP), database operations, and mail server applications.

SPC-1 benchmark results include three primary metrics: First, the results must include the I/Os per second (IOPS), which represents the maximum I/O request throughput at the 100% load point. Next, the results must include the total Application Storage Unit (ASU) capacity, which represents the total storage capacity read and written to during the benchmark test. Finally, the results must include the price-performance ratio, represented as cost per I/O.

The SPC requires that the tested storage configuration pricing reflect a customer-orderable configuration. In addition, the data-protection level (e.g., RAID 5), tested storage product category, and SPC-1 audit identifier must be stated.


The SPC-2 benchmark consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business-critical applications that require large-scale, sequential movement of data. These applications are characterized by large I/Os organized into one or more concurrent sequential patterns. The three workloads include large file processing, large database queries, and video-on-demand.

The large file processing workload addresses applications in a wide range of fields, which require simple sequential processing of one or more large files. Examples include scientific computing and large-scale financial processing. The large database query workload addresses applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence applications. Finally, the video-on-demand workload addresses applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

As with SPC-1, SPC-2 benchmark results include three primary metrics: First, the results must include the data rate, which is measured in megabytes per second (MBps) and represents the aggregate data rate of all three SPC-2 workloads (large file processing, large database query, and video-on-demand). Next, the results must include the total Application Storage Unit (ASU) capacity, which represents the total storage capacity read and written to during the benchmark test. Finally, the results must include the price-performance ratio, represented as cost per MBps. SPC requires that the tested storage configuration pricing reflect a customer-orderable configuration. In addition, the total system price, data protection level (e.g., RAID 5), tested storage product category, and SPC-2 audit identifier must be stated.

Future SPC benchmarks

The SPC plans to unveil three new storage benchmarks. Although the details are still being finalized, the new benchmarks will likely be called SPC-1C, SPC-1F, and SPC-3.


The SPC-1C benchmark will be the first industry standard performance benchmark to provide product performance comparisons for individual storage components that comprise a larger storage solution. It will include the performance of disks, HBAs/controllers, small storage subsystems (1 to 16 disks), and logical volume managers. The SPC-1C benchmark will use the same workload and reporting requirements as SPC-1.

An optional SPC-1C Availability Test will replace the required SPC-1 Persistence Test. This test is applicable in configurations that provide uninterrupted operation and “hot” rebuild/recovery in the case of a single drive failure. The SPC-1C Availability Test will report if processing continues in the case of a drive failure, the level of performance when operating in “degraded mode” after the failure, the time required to recover from the failure, and the level of performance during the recovery.


The SPC-1F benchmark is designed to measure file system performance. It will use the SPC-1 workload and include some modifications so that it can run across a file system. As a result, this benchmark could be used to assess the performance of NAS systems and clustered file systems. SPC-1F will report the same primary metrics as SPC-1, but the secondary metrics are still being determined.


The SPC-3 benchmark will take the end-user view of the virtualized storage environment. This benchmark will measure file system performance, but is focused on file system operations (e.g., create, open, close, metadata, and document repositories) and is not designed to measure transactional environments. It will use an entirely different workload than SPC-1F, and the remaining details are still being finalized.

For more information on the SPC benchmarks, visit www.storageperfor mance.org.


The Standard Performance Evaluation Corporation offers a suite of performance benchmarks. Founded in 1998, SPEC is a non-profit organization formed to establish, maintain, and endorse a standardized set of benchmarks designed to measure computer performance. SPEC benchmarks are designed to be vendor- and platform-independent. The results published on the SPEC Website have been reviewed by SPEC members for compliance and disclosure rules.

Membership in SPEC is open to any company or entity that is willing to commit to SPEC’s standards. The current roster has more than 60 members, including most of the leading storage vendors.

SFS 3.0

The SPEC System File Server 3.0 (SFS 3.0) benchmark is designed to evaluate the speed and request-handling capabilities of NFS systems and is often applied to NAS environments. SPEC SFS 3.0 is a system-level benchmark that heavily exercises CPU, storage, and network components. The greatest emphasis is on I/O, especially as it relates to operating and file system software.

To obtain the best performance for a system running SFS 3.0, vendors typically add hardware such as memory, disk controllers, disks, network controllers, and buffer cache to help alleviate I/O bottlenecks and to ensure CPUs are fully utilized. Shared resources that might limit performance include disk controllers, disks, network controllers, concentrators, switches, and clients.

SPEC SFS benchmark results include two performance metrics: a throughput measure (in operations per second) and an overall response time measure (the average response time per operation in milliseconds). The larger the peak throughput, the better, and the lower the overall response time, the better. The overall response time is an indicator of how quickly the system under test responds to NFS operations over the entire range of the tested load. In real-world situations, servers are not run continuously at peak throughput, so peak response time provides only minimal information. The overall response time is a measure of how the system will respond under an average load.

For more information on the SPEC benchmarks, visit www.spec.org.


While the SPEC SFS benchmark is useful for measuring the performance of NAS environments using NFS, a similar benchmark for measuring CIFS is not available from SPEC at this time. However, some vendors run the NetBench benchmark, which measures how well a server handles network file operations. In a nutshell, NetBench sends a variety of I/O requests to the server and measures how long the server takes to handle them.

NetBench benchmark results include two primary metrics: an overall I/O throughput score (megabits per second) and average response time (milliseconds) for the server and individual scores for the clients. The metrics can be used to measure, analyze, and predict how well a server can handle file requests from clients.

The Storage Performance Council is currently working on a new benchmark (SPC-3) that will assess CIFS performance.

Using benchmark results

Benchmarks are designed to simulate real-world environments. However, in many cases the workloads tested may not reflect the actual deployment environment. For this reason, users must be sure to compare the benchmark workload to their real-world workloads.

Although regulated benchmarks typically require vendors to state the workload configurations, interpreting benchmark results can still be challenging. For example, some vendors may specifically tune their products to maximize benchmark results.

Enabling specific settings (e.g., disk cache) may result in superior performance results, but might not be recommended for real-world workloads. Also, the benchmark workload may assume that the server or storage device is dedicated to a specific application. However, the real-world deployment model may task the server with running multiple applications, which makes the benchmark results less useful.

If a specific workload does not closely match that of the intended deployment environment, then users should ask the vendors to provide access to customer-reference sites. By communicating with other users that have similar real-world environments, users can better understand their expected performance results. Most vendors maintain a list of customer-reference sites intended for this purpose. Also, it provides a means to obtain unbiased feedback because end users are typically very willing to share their experiences with other users.

While the benchmark results discussed in this article focus mainly on performance, this is just one piece of the decision criteria. Other important purchasing factors include total cost of ownership, return on investment (ROI), training, reliability, security, and service. For example, if performance is outstanding but the product is not reliable and requires expensive maintenance contracts, then the user may want to reconsider the purchase. Some vendors have documented the overall ROI associated with real-world product deployments. Users should ask vendors to provide ROI information before they make major purchases, and they should feel free to contact the customer references listed in the report.

The type of network used in the real- world environment can also impact benchmark results. For example, if a user has a SAN environment but the benchmark assumes a directly attached device, then the benchmark results may be irrelevant. Also, grid-computing environments, which can scale linearly in terms of processing power, capacity, and availability simply by adding nodes to the network, are not covered by standard benchmark tests. And clustered file systems are also not addressed by existing benchmarks, which makes standardized benchmark results hard to apply.

After the benchmark results and limitations are understood, users need to consider the tested configuration along with the price. If the tested configuration does not resemble the configuration to be purchased, then the benchmark results will be useless. Extrapolating benchmark test results for lesser configurations can be difficult or impossible, since results are unlikely to be linearly correlated with any single optional component in the test configuration. Unfortunately, this scenario is likely to occur given that vendors typically want to show maximum performance results even though the bulk of their sales may be based on significantly different configurations.

Also, users must be sure to obtain the list price of the tested configuration and verify that the configuration is available for sale. Some vendors will omit pricing because the tested system is outside the target market price range. Again, vendors are torn between posting the best performance results and using real-world, affordable configurations that might have lower performance.

Performance benchmarks are valuable because they can enable users to measure performance across a range of competing products. However, the results are only meaningful if users fully understand the key benchmark results, workloads, tested configurations, and the price of the systems. In some cases, vendors will run benchmark tests using configurations that are not likely to be used in real-world environments or not available for sale. And if the system configuration and price are not disclosed, then users will have no idea if the described performance results are within their budget.

IT professionals should demand that storage vendors provide meaningful benchmark results that include the tested configuration and pricing. Also, the performance of grid computing environments and clustered file systems need to be measured, which means that new benchmarks must be created.

Vendors should be proactive in guiding the development of benchmarks through third-party consortiums such as the SPC and SPEC. To date, storage benchmarks have been relatively limited, but if all goes according to plan, the SPC will unveil three new benchmarks by year-end. This is quite a feat given that gaining consensus in a consortium environment comprising competing vendors can make herding cats look easy.

This article was excerpted from a larger report (which includes information on server benchmarks) by the Taneja Group-www.tanejagroup.com.

This article was originally published on December 01, 2006