High velocity data and access density


While doing some research for a white paper, I came across the term “high velocity data”—a term that resonated with me but caused me to wonder what the implications of high velocity data were for storage architectures.

The term refers to the speed at which critical business data flows in and out of an enterprise, how quickly data movement is assimilated into the data pool, and how quickly data updates are available to support ongoing business operations.

Consider Snapfish, the on-line photo outfit. Unless its data delivery infrastructure is very nimble and responsive, its online customers will get disillusioned and switch to a competitor. This high level of responsiveness needs to be maintained despite a highly variable query volume, which on a peak day can be 10 million print requests. This is a good illustration of high velocity data at work, where the survival of an enterprise is dependent on a storage infrastructure that cannot just meet the challenge effectively, but do so affordably.

But what measures the efficiency of a storage solution in regard to how it will handle high velocity data?

Obviously, bandwidth is important—and, depending on the application, it could be the critical performance parameter. But just as critical is data latency. This is the measure of how quickly data is made available following a request. The obvious efficiency factor is the speed at which users’ screens are refreshed following a query.

The determining variable for data latency is access density, which establishes how long wait times are under the pressure of a high query hit rate. Access density is defined as the ratio of drive I/Os per second (IOPS) over capacity (IOPS/GB). If capacity doubles and performance doubles, then access density remains unchanged.

But scaling disks relates to more than just capacity, and despite techniques to counter the performance problem (such as larger cache and actuator-level buffers), the imbalance remains. In short, as areal density has increased, more capacity sits under actuator and the access performance of the drives drops. This access performance hit extends to the subsystems that use these drives.

The solution to improving data latency is to have less data under each actuator. The question is, how?

One approach is to short-stroke the drives. This means that the placement of highly active data is restricted to the outer bands of the disk. This is a good way to waste a lot of expensive disk space unless policies are in place to harvest the inner bands for the not-so-high-velocity data. (Pillar Data, for example, uses onboard QoS policies to determine where data resides on the disk.)

Another option is to increase the number of spindles in order to increase I/O performance—preferably, in the same or smaller physical footprint. A large number of densely packaged spindles is what the new generation of clustered, high-density storage solutions employ. Today, most solutions use 3.5-inch drives, but by substi- tuting them with 2.5-inch drives, packaging density and spindle count can be increased without increasing footprint. Increasing the spindle count increases the access density ratio.

A number of vendors are already exploiting 2.5-inch drives in enterprise-class storage systems, and judging from the recent Seagate announcement of 15,000rpm 2.5-inch drives, a number of major systems/storage vendors are not far behind. For green-conscious companies, note that 2.5-inch drives consume much less power— typically, 70% less than 3.5-inch drives.

Access density is a blend of I/O performance and capacity, with solution possibilities tempered by cost. Each must be balanced to reach the compromise that will deliver the performance needed at a cost that is affordable. Scale-out storage offers the most likely opportunity for succeeding in this quest.

High velocity data is the transactional lifeblood of commerce. So, the ability of a storage system to affordably manage large volumes of high velocity data means that the system must be optimized to deliver a high access density that keeps data latency low.

BILL MOTTRAM is a managing partner at Veridictus Associates and an analyst with Data Mobility. He can be contacted at: bill@veridictusassociates.com, and blogs at www.storagetopics.com

This article was originally published on March 01, 2009