As with any product, there are several criteria users assess when making a RAID subsystem buying decision. These include features, price, support, ease of use, and performance. When comparing performance of storage products, it is important to measure them fairly, with equivalent configurations, in an “apples-to-apples” comparison. Equally important is to measure performance under conditions in which the product will be operating.

Ideally, a company purchasing RAID arrays would evaluate them in a real-world environment, using production applications. However, this is unrealistic given the required resources and costs. As a substitute, controlled RAID configurations can be created on a test server, where synthetic workloads are applied. The synthetic workloads simulate the organization’s real-life applications. This method allows performance measurements to be made in an apples-to-apples test environment with repeatable results.

Intel’s Iometer is a widely used workload generation and performance measurement tool for RAID arrays. Iometer is not a benchmark, which generates a score for the product being tested. Rather, Iometer is a tool that can be used to evaluate a RAID subsystem using synthetic workloads defined by the user. Creating and using an appropriate workload is critical to having the results of the benchmark tests accurately predict the relative performance difference between different RAID subsystems.

Each application generates a unique I/O profile to the storage subsystem. For many applications, the I/O profile consists of only a couple of unique I/O types, and a synthetic workload can be easily defined. The following are I/O profiles for several common server applications.

1. Transaction Processing Workload

The I/O profile for transaction processing applications is variously referred to as a transaction processing workload, database server workload, OLTP workload, or TPC-C workload. Under any of these names, the synthetic workload reflects the I/O profile of a database server accessing its storage subsystem while processing transactions.

The transaction processing workload is modeled with the following specifications under Iometer terms:

8KB or 2KB transfer size
100% random I/O
67% read I/O
Full disk capacity
1 user per CPU
Vary number of outstanding I/Os

In the past, the size of transaction processing I/Os was typically 2KB. The more common size of transaction processing I/Os is now 8KB. For this reason, many benchmark reports will provide both 2KB and 8KB transaction processing performance.

2. File Server Workload

The I/O profile that a file server application generates for its storage subsystem is similar to the transaction processing workload described above. The difference for file server workload is the transfer size and the ratio of read I/Os to write I/Os.

The specifications for the file server workload are:

4KB, 8KB, or 16KB transfer size
100% random I/O
80% read I/O
Full disk capacity
1 user per CPU
Vary number of outstanding I/Os

The 4KB transfer size for the file server workload represents a typical size. However, some file server applications may generate an I/O profile that has larger transfer sizes of 8KB or 16KB.

3. Web Server Workload

One I/O profile for a Web server workload is assumed to be read-only (e.g., only serving content). There is no read-write activity here for processing Web requests on this server. The workload consists of a mix of transfer sizes, ranging from very small I/Os (512 bytes) to very large I/Os (512KB). The mix provided below is based on the I/O profile generated by a standard Web benchmark.

Transfer size mix: 22% 512 bytes, 15% 1KB, 8% 2KB, 23% 4KB, 15% 8KB, 2% 16KB, 6% 32KB, 7% 64KB, 1% 128KB, and 1% 512KB

100% random I/O
100% read I/O
Full disk capacity
1 user per CPU
Vary number of outstanding I/Os

4. Video Server Workload

This workload models the I/O profile for video servers that are streaming digital video data to clients. The characteristics for the workload are:

64KB transfer size
100% sequential I/O
100% read I/O
1GB disk capacity per user
1 outstanding I/O per user

Vary number of users

The unique characteristic for this workload, compared to the previous workload examples, is the access setup for the Iometer users. In the case of a video server, each client is expected to be accessing a different digital video file from a different portion of the disk, whereas clients would typically access the entire capacity of the disk for the transaction processing, file, and Web server workloads. One method to simulate separate ‘files’ is to configure each user with a different starting and maximum sector value, with a range around 1GB.

5. Write Streaming Workload

While the I/O size and type for this workload are the same as the video server workload described above, there is a significant difference. In this case, all users are assigned to the entire disk capacity, instead of dedicated ranges. Workstation applications such as M-CAD, E-CAD, and video and graphics editing generate this I/O type.

The characteristics for the workload are:

64KB transfer size
100% sequential I/O
100% write I/O
Full disk capacity
Vary number of outstanding I/Os
1 user

6. E-Mail Server Workload

E-mail servers generate I/O more similar to a database, file, or Web server application than a video server or workstation application. A synthetic workload to model an e-mail server application is not currently available, but would be random in nature, with smaller transfer sizes and a majority of read I/Os. Using results of the database, file or Web server workloads should provide insight into the performance of the storage subsystem under an e-mail server workload.

Workload tuning Some workloads used to benchmark RAID subsystems are not based on application I/O profiles. These benchmarks can sometimes generate extremely high-and misleading-performance results. The performance results from these workloads, while dramatic, provide little benefit when evaluating RAID subsystems that will support one of the server applications described earlier.

Some of the methods used to produce these “specialty” benchmark workloads include:

  • Reduce transfer sizesThe smaller the I/O transfer size, the more I/Os per second can be delivered. This is most apparent for small sequential I/O transactions. Disk caches enhance the performance of this type of workload by “hiding” the disk mechanical delays of seek time and rotational latency. RAID subsystems can also be tuned to accelerate this workload by batching the large numbers of small I/O transactions from the host into a single, larger disk I/O. This type of RAID firmware tuning can deliver very high I/O rates; however, there are no applications that generate this I/O profile in the real world. Detecting sequential I/O streams adds overhead to the RAID software. This can negatively impact performance under workloads where small sequential I/Os are not present.
    As seen in Figure 1, the small I/O transfer size of 2KB delivers approximately 10 times the I/O rate of the 64KB transfer size. The actual throughput, however, is less than one-third. For applications that generate streaming I/Os, the transfer size is large (32KB or 64KB), and the throughput of the storage subsystem with large sequential I/Os is the key metric for performance.
  • Reduce disk capacityOne method to reduce disk seek latency and its impact on random disk I/O performance is to reduce the disk capacity used by the user. Seek latency refers to the time the servo motor takes to move the magnetic heads across the disk platter. By reducing the disk capacity used, the distance the heads must move becomes smaller, and seek latency is reduced. As illustrated in Figure 2, reducing the disk capacity can provide an apparent performance advantage of more than 2x when compared to testing with full drive capacity. Again, this does not reflect real-world usage, and performance differences between RAID subsystems using this workload may not be reflected in a real application.
  • Drive countThe number of drives used in a benchmark configuration can be a significant factor for performance. This primarily applies to random I/O workloads, and is more significant for workloads with smaller transfer sizes than larger transfer sizes. Figure 3 illustrates the performance impact of drive count for a small random I/O workload.
  • Change queue depth The number of outstanding I/Os issued to the storage subsystem is also referred to as the queue depth. Higher queue depths are used to simulate heavier usage of the server. A high queue depth is equivalent to having many requests for data. A typical heavy server load occurs at a queue depth of 32 to 64 outstanding I/Os.
    Figure 4 illustrates the impact of queue depth on I/O performance. For small transfer sizes, performance is limited by the disk mechanical latency. As described above, this latency can be minimized by increasing the disk count, and by reducing the capacity or seek range of the volume. As the transfer size increases, each I/O becomes more and more limited by the throughput of the I/O bus.

 

Other performance considerations include:

  • RAID level The RAID level will impact performance of a RAID configuration. Also, the performance of disk reads between RAID levels will differ from the performance of disk writes for those same RAID levels. For this reason, performance testing should be conducted with disks configured in the same RAID level.
  • In general, RAID-0 will provide the highest performance, while RAID-5 will yield the lowest performance for disk writes. However, with disk reads, storage subsystems configured with RAID-0 should deliver the same performance as the same subsystem configured for RAID-5.
  • Strip size When creating a RAID configuration, the strip size or block size (minimum size segment per disk in RAID volume) is also specified. The default setting is generally the best choice; however, there can be specific cases where tuning the strip size can increase performance. The cases where this tuning is beneficial are limited, and are impacted by the drive count, RAID level, cache setting, and the I/O transfer size.
  • RAID parity initialization Newly created volumes using RAID levels that provide data protection (3, 5, and sometimes 1) require initialization. Some solutions will allow for immediate access of the volume, and complete initialization in the background. This is a great feature for end users; however, running benchmarks while background initialization is taking place will have a detrimental affect on performance results. When benchmarking subsystems for comparison, ensure that newly created RAID volumes are completely initialized prior to beginning benchmark tests.

Other Metrics
Other performance metrics may be equally important to I/O rate (I/Os per second) and throughput (MB per second), including CPU utilization; CPU effectiveness (throughput divided by CPU utilization); Average response time; and PCI bus utilization (not measured by Iometer).

Using synthetic workloads that model I/O profiles to measure the performance of storage subsystems is a valuable method to measure performance differences in an isolated environment. The workloads used should reflect the I/O profile of the applications to be implemented. The synthetic workloads provided in this article represent common server applications. Tuning the workload can achieve increased performance metrics; however, the modified workload no longer reflects the I/O profile of applications used in the real world.