For some applications, solid-state disks may significantly increase performance, but preliminary I/O analysis is a must.
By Michael Casey
As enterprises scale up mission-critical applications such as e-mail, messaging, and e-business transaction processing, many are finding that traditional cached disk arrays do not deliver the needed transaction rates and response times. This is inevitable, because disk drive capacities are growing exponentially while disk I/O performance is not. One solution is to identify a small set of files that consume most of the I/O activity, and place those files in a high-performance file cache. This approach typically uses solid-state disk (SSD) for file caching, and increases overall transaction performance by a factor of 200% or more on existing servers.
Figure 1: Disk drive capacities have increased by a factor of 30x in the past eight years, while random-access performance has improved only slightly.
A variety of performance-analysis software tools can help users analyze the I/O profile of an application workload and determine which files are the best candidates for caching. This article outlines methods for performance analysis and tuning, illustrated with real-world examples.
Figure 1 illustrates the divergence between disk drive capacity and I/O performance increases. Disk drive capacities have in creased by a factor of 30x in the past eight years, while the random-access performance of an individual mechanical disk drive has improved only slightly. On a per-GB basis, disk subsystem performance actually decreases with each improvement in disk storage density and cost. Figure 2 illustrates this trend in terms of access density-the I/Os per second (or IOPS) divided by the maximum capacity per disk drive.
Figure 2: On a per-GB basis, disk subsystem performance decreases with each improvement in disk storage density.
For example, a single-bay EMC disk array can hold 96 disk drives. Each 3.5-inch drive could hold 9GB when EMC moved from 5.25-inch to 3.5-inch drives about three years ago. Now each drive can hold 18GB, 36GB, or 50GB. However, each of the 96 drives still delivers only about 100 IOPS, so the raw back-end performance of the box is around 9,600 IOPS-no matter which capacity disk drives are used.
Figure 3: Solid-state file caching can offer improvements over RAID controller-based block caching.
To improve performance on random-I/O workloads, disk array vendors incorporate semiconductor cache inside disk subsystems. The performance benefit of such a cached RAID approach depends on the cache hit ratio, which in turn depends on the data access patterns of the applications being supported.
In many of today's most transaction-intensive applications, enterprises and service providers are finding that they must spread the data over a large number of disk drives and array frames to achieve an adequate number of I/Os per GB of data stored. Even then, mechanical disk latencies can significantly constrain performance and scalability: The application server may be wasting a significant portion of its potential processing power while it waits for the disk subsystem to complete I/O requests.
Disk I/O performance scaling
Today's performance issues with cached RAID solutions are part of a long-term trend, driven by the rapid increase in disk capacity. Over the past 10 years, as the capacity per drive has increased faster than the I/O performance per drive, system designers have adopted new architectures to maintain application performance levels and to respond to demands for increased transaction throughput. Figure 3 illustrates this progression.
Figure 4: A small, I/O-intensive fraction of the data can be moved from cached RAID to a separate, non-volatile file cache.
The authors of the original RAID papers (a group of computer scientists at UC Berkeley) described the problem with a SLED-a single large expensive disk-and proposed various RAID architectures as possible solutions to the performance problem.
In essence, the idea was that system designers could improve per formance by striping the data across an array of independent disk drives. This data-striping architecture provided significant performance improve ments in suitable applications.
Figure 5: In the example of a Sendmail message server, solid-state file caching virtually eliminates the I/O wait time.
Next, in response to the ever-increasing demands for application performance and scalability, RAID vendors began to design RAID controller hardware with "block cache." And RAID controller cache still adequately addresses the I/O performance needs of most applications.
However, cached RAID approaches may run out of gas in high-performance, transaction-intensive applications. While new disk drive generations are decreasing the access density of the back-end disk arrays, the performance demands on e-business infrastructures are being driven to new levels by exploding demand, unpredictable peak loads, and an increasingly impatient population of on-line customers. As Figure 3 suggests, one possible solution is a move to hot-file caching on external, persistent solid-state storage.
File cache vs. block cache
For many transaction-intensive applications, it is possible to identify a small set of files that consume most of the I/O activity and place those files in a high-performance cache. This approach typically uses SSD for file caching and increases overall transaction performance by a factor of 200% or more. File caching differs from block caching in several respects.
Figure 6: Graph shows the CPU utilization analysis for a Peoplesoft application running on an Oracle database. In this example, the I/O wait time represents 63% of CPU time at peak load.
RAID cache-"block caching" in a RAID controller-is based on data blocks. Each block is identified by a SCSI block address (for example), and the RAID controller has no knowledge of which blocks are part of which file. The controller chooses what to cache (and what to flush from cache) based on historical usage statistics on the individual blocks (or disk tracks, in the case of EMC). The caching algorithm looks at the usage history, and tries to determine what will be needed next by the application. Because the controller cache is much smaller than the total amount of data stored on the disk drives, only a small percentage of the data can be kept in the cache. A given data block may reside in cache for only a few seconds, or minutes, before it is flushed from cache.
The effectiveness of block caching depends on the application's data access patterns, and usually an application will reach a point of diminishing returns-a point where adding more cache will not deliver much additional performance improvement. Once the application reaches that point, the next step is to start caching entire files in a very fast I/O device.
Figure 7: Screen shot shows the results of hot-file analysis in a Peoplesoft application.
File cache effectiveness depends on the user's understanding of the application structure-i.e., the identification of the application-specific "hot files." Once the hot files have been identified, selected files are moved to the file cache-as a policy decision, not a statistical extrapolation. The hot files may reside on the SSD for days, weeks, or months. The "hit ratio" on data in the file cache is always 100%, because the entire file is always in the cache and available for access.
Alternative cache locations
A system or application architect can establish a file cache (or a block cache) in any of several locations. For example, Oracle buffer caches and Unix file system buffers generally make use of the system's main memory. This is a suitable location for read caching, but is not recommended for write caching due to the volatility of main memory.
Some computer hardware vendors do offer non-volatile cache inside the server, using DRAM with battery backup, and that is a suitable approach for in-line block cache in single-server configurations of limited size. However, if the facility is to be used as a file cache and shared among multiple servers in a cluster or storage area network (SAN), then an alternative is a separate, non-volatile file caching device or appliance.
Figure 8: With an architecture that combines cached disk arrays and solid-state file caching, it is possible to add performance and capacity as independently scalable features.
Several RAID subsystem vendors offer a non-volatile file caching facility as an optional feature of the RAID controller cache. Notable examples include EMC's Symmetrix, Hitachi Data Sys tems' 7700E, and Hewlett-Packard's XP256 disk arrays. In essence, the RAID vendor offers the ability to dedicate part of the cache for storage of specific files (or virtual volumes). EMC calls this feature "PermaCache," and HP calls it "HP SureStore E Cache LUN XP."
This approach may satisfy a need for a small amount of file cache, if an adequate number of spare cache slots are available within the disk array frame. However, it is not easily scalable beyond the cache size limit of the existing frame. Few enterprises will want to buy another frame just to install another one or two gigabytes of file cache.
Also, in many cases, the RAID subsystem needs all the block cache it can get; allocating part of that limited resource to a file caching facility will impact performance of the cached RAID facility. A separate file cache may provide a more scalable architecture because it enables users to add file-caching resources independently from the RAID frames.
When does it make sense?
Two conditions are necessary to make solid-state file caching a good bet:
- The application server must be I/O bound
- The I/Os must be skewed (e.g., a small percentage of the files must drive a large percentage of the I/O activity)
When analyzing a transaction performance problem on an application server, the first step is to determine whether the server is I/O bound. If the server is compute-bound rather than I/O bound, then improving the I/O performance is unlikely to help. System administrators typically use tools such as sar and V$SYSSTAT to determine whether or not the server is "waiting for I/O" a significant percentage of the time, under peak load conditions.
If the server is indeed I/O bound for a critical application at peak load, the next step is to determine whether the I/O activity is skewed. (If it isn't, then the application's requirements can generally be met by adding more disk spindles in a cached RAID subsystem.) In some applications, answering this question is fairly easy, because the identities of the hot files are inherent in the design of the application.
For example, in many e-mail and messaging applications, the message queues represent a high percentage of the total I/O, on a small percentage of the data. They are also very write-intensive files. This skewed I/O distribution is a feature of the application design, and thus the application is a good fit for adoption of solid-state file caching.
Figure 4 illustrates this concept: a small, I/O-intensive fraction of the data is moved from cached RAID to a separate, non-volatile file cache.
In suitable applications, addition of a solid-state file cache to an existing server can boost throughput by a factor of four or even eight. This enables the system administrator to deliver the required performance and service levels, without purchasing and managing several additional servers.
Figure 9: File caching with solid-state disks can be added to storage area network (SAN) environments.
Figure 5 shows the impact of solid-state file caching on the performance and scalability of a Sendmail message server. In the original configuration-a Sun server with all data stored on hard disk drives-the server was in "wait for I/O" mode 40% of the time at peak load. After the message queues were moved to a solid-state file cache, the I/O wait was largely eliminated, and the server was able to process four times as many messages per second. The Internet service provider (ISP) was able to scale the application by increasing the throughput of each server, rather than buying four times as many servers and managing a more complex hardware environment.
As this ISP's business continues to grow, the e-mail application-server configuration is simply replicated as part of a scalable infrastructure: one solid-state file-cache hardware module for each pair of new servers installed.
Oracle databases are often more complex than Sendmail servers, and more difficult to analyze. Oracle tends to spread disk I/Os across multiple devices, even within a single file, and operating-system tools do not readily reveal which files are causing most of the device I/O traffic. In general, the most likely candidates for file caching in an Oracle database are very active index files, re-do logs, and temp spaces. However, a large Oracle implementation might have multiple files of each type, so hot-file identification tools and procedures are important parts of a database/system administrator's toolbox.
As in the previous example, the first step in I/O performance analysis is to determine whether the application server is I/O bound. Sar scripts and other tools can provide the basis for such an analysis. Various vendors offer services and tools to facilitate the process.
Figure 6 shows the CPU utilization analysis for a Peoplesoft application running on an Oracle database. In this example, the I/O wait represents 63% of CPU time at peak load.
Based on the high percentage of wasted CPU cycles-time the pro cessor spends waiting for disk I/O requests to complete, dur-ing peak-load conditions-this Peoplesoft application appears to be a good candidate for I/O improvement. How ever, at this point, we do not know for sure whether the solution is simply to add more disk spindles and I/O paths, or whether adding solid-state file cache would be the most effective solution.
In other words, we still must answer the second key question: Is the disk I/O uniform or skewed? And in Oracle applications, simply looking at system-level device I/O statistics is not very useful, because, as mentioned previously, Oracle tends to spread the hot files across many devices. Such a policy may be efficient in a world where mechanical disk drives are the only I/O devices available, but it ties the application performance to the average performance of the disk farm, and it complicates the task of moving high-activity files onto a faster device category.
I/O dynamics for Oracle
For an Oracle application, DBAs and system administrators need tools that help them identify the most I/O-intensive files or tables. (A tool for this purpose is available for free download at http://www.soliddata.com/products/iodreg.html.)
These tools monitor the file activity of an Oracle database, and can help identify files with high levels of I/O activity that may be having a significant impact on overall system performance. The tools can monitor and analyze one or more running Oracle instances.
Figure 7 shows the results of hot-file analysis on the Peoplesoft application considered in Figure 6. The display combines I/O-rate data for each file (reads per second and writes per second) with file size information for each file. It also presents a computed ratio called I/O density, which reflects the percentage of overall I/O activity represented by the file, divided by its percentage of the total storage consumed by the application.
These statistics allow users to identify files that represent a large percentage of the I/O on a small amount of total disk space, and to target those files for relocation to a solid-state file cache.
Figure 8 illustrates one aspect of an architecture that combines cached disk arrays and solid-state file cache. With this architecture, it is possible to add performance and capacity as independently scalable features of the infrastructure.
In the future, SANs will be wide ly deployed and supported by sophisticated storage management tools, such as virtual storage architectures and policy-based storage management consoles, as shown in Figure 9.
As a shared facility on the SAN, solid-state storage will be easy to deploy and manage. As this occurs, file-cache appliances may prove increasingly beneficial for deployment in a wide range of transaction-intensive applications.
Michael Casey is vice president of marketing at Solid Data Systems, in Santa Clara, CA. www.soliddata.com. This article is based on a technical paper, presented at the International Oracle Users Group Americas conference in Anaheim, CA, on May 13, 2000. See http://www.ioug.org/