Analyst View: Examining I/O performance issues

By Greg Schulz

—It should come as no surprise that businesses continue to rely upon larger amounts of disk storage and higher I/O performance, which feed the hungry needs of applications striving to meet SLAs and QoS objectives. Even with efforts to reduce storage capacity or improve capacity utilization with information lifecycle management (ILM), applications leveraging rich content will continue to consume more storage capacity and require additional I/O performance.

The continued need for accessing more storage capacity results in an alarming trend: The server-to-I/O performance gap between fast CPUs and slower disks continues to widen. The net impact is that bottlenecks associated with the server-to-I/O performance gap result in lost productivity for users who must wait for transactions, queries, and data-access requests to be resolved.

I/O bottlenecks
There are many applications that require timely data access and that can be negatively impacted by common I/O performance bottlenecks. For example, as more users access a popular file, database table, or other stored data item, resource contention will increase. One way resource contention manifests itself is in the form of database "deadlock," which translates into slower response time and lost productivity. Given the rise in popularity of Internet search engines and online shopping, some businesses have been forced to create expensive read-only copies of databases. These read-only copies are used to support more queries and prevent workloads from impacting time-sensitive transaction databases.

Generally speaking, as I/O activity and application workload are added, I/O bottlenecks result in increased response time or latency. Typically, as more workload is added to the system, increasing I/O bottlenecks have a negative impact by raising response times above acceptable levels.

Another common challenge, and cause of I/O bottlenecks, is seasonal and/or unplanned workload increases that result in application delays and frustrated users and customers. For example, eCommerce transactions typically peak during holiday shopping. When seasonal activity spikes, the resulting impact to response times often falls below the acceptable threshold of performance.

Besides impacting user productivity, I/O bottlenecks can result in system instability or unplanned application downtime.

The typical approaches to I/O bottlenecks have been to do nothing (and deal with the service disruptions) or over-configure by throwing more hardware and software at the problem. For example, IT organizations sometimes add storage hardware to mask the problem. However, this leads to extra storage capacity being added to make up for a shortfall in I/O performance. By over-configuring to support peak workloads and prevent loss of business revenue, excess storage capacity must be managed throughout the non-peak periods, adding to data-center and management costs. The resulting ripple affect is that now more storage needs to be managed, including allocating storage network ports, configuring, tuning, and backing up data. This results in environments that have storage utilization rates well below 50%. The solution is to address the problem rather than mask the problem.

Putting a value on the performance of applications and their importance to your business is a necessary step in the process of deciding where and what to focus on for improvement. It is important to understand the value of performance, including the response times and I/O rates required for particular environments and applications. While the cost per raw terabyte may seem relatively low, the cost for I/O response-time performance also needs to be effectively addressed and put into the proper context as part of the data-center QoS cost structure.

There are many approaches to address data-center I/O performance bottlenecks, with most centered on adding more storage hardware to address bandwidth or throughput issues. Time-sensitive applications depend on low response times as workloads increase, so latency cannot be ignored. The key to removing data-center I/O bottlenecks is to address the problem instead of simply moving or hiding it with more storage hardware.

Greg Schulz is founder and senior analyst at the StorageIO group, as well as the author of Resilient Storage Networks