In the cloud world, the tiering options have in the past been fairly straightforward: standard online storage for hot data, and less expensive offline storage (such as Amazon's Glacier service) for cold data. Standard data storage services offer near-instant retrieval times, while data stored in services such as Glacier is generally available with a retrieval time of several hours.
Nearline storage appears to be a completely new cloud storage tier, offering the rock bottom cost of Glacier (currently 1c per gigabyte,) but with a time-to-first-byte measured in a matter of a few seconds rather than hours. It is designed for infrequently accessed data, and a throughput of 4MBps per TB of data stored is promised.
In fact both Amazon and Google offer an intermediate storage tier – Google's Durable Reduced Availability (DRA), and Amazon's Reduced Redundancy Storage (RRS). Both are intended to offer similar performance to the standard storage offerings, but at a slightly lower cost. (For example Google charges 2.6 cents per GB for Standard storage, and 2c per GB for DRA.)
The catch is that DRA may suffer delayed availability at peak demand times, while RRS can only sustain the loss of data in one Amazon facility rather than the standard offering's two.
"The improved access time (of Nearline over Glacier) should simplify things for customers since they do not have to factor in a few hours of wait time for every archive request," says Henry Baltazar, a senior analyst at Forrester Research. "I'm hoping it will ultimately drive down the pricing of archive storage in the future."
Glacier has an element of mystery to it because Amazon has never revealed what the underlying storage medium is. Possibilities include tape systems, spin down hard disks, low performance disks, or even optical systems. But whatever it is, the data is not quickly available, and therefore no use for content serving or for use by running applications.
Google is being a little more upfront about what's behind Nearline. "The underlying technology is exactly the same as our Standard and DRA offerings, with the same user interface and APIs to learn," says Tom Kershaw, director of product management for the Google cloud platform. Nearline data can be accessed in the same way as data in any other Google storage type, using XML API, JSON API, the gsutil command line tool, and the Google Developers Console.
How can it offer this type of storage at a price comparable to offline storage? "The combination of a-priori knowledge that the data will be infrequently accessed, and slightly relaxing the availability and latency expectations means that storage can be provided at a significantly lower cost per GB per month, allowing it to be offered at a price that is competitive with offline storage services" is how Google attempts to explain it.
What does that actually mean? "It has to do with the number and location of copies and network usage," explains Kershaw. "As you relax these criteria, it gives you the flexibility to do things more efficiently. For example, if the location is relaxed you can put data where there is excess retrieval capacity. So you can make more economic decisions."
An interesting question then is: what does Google foresee this new storage tier being used for? Regular cloud storage is designed to offer access to data with minimum latency and is aimed at applications including content serving and data analysis. And offline storage is designed for similar applications that tape libraries are used for in data centers: long term archiving, backup and disaster recovery. So where does Nearline fit in?
Kershaw sees three main use cases. The first is disaster recovery and backup data, which may only rarely or even never be required – but in the case of a disaster or data loss it is highly desirable to start getting it back quickly rather than having to wait several hours before the data starts flowing from offline storage. And if it’s the same price as offline storage, that would seem a no brainer.
The other two cases are more interesting: enterprises storing their older log files, and consumer services storing customers' videos and photographs.
These cases are interesting because Kershaw foresees companies moving log files from standard to Nearline storage after about twenty four hours when they are less likely to be needed. And since most videos and photographs older than about one month old are rarely or never accessed again by their owners – but if they do want to access them they want them almost immediately – this type of data is also a good match for Nearline storage.
Because Nearline and Standard data storage services are all stored on the same backend systems, data owners can easily move their data from one tier to the other by manually changing the bucket type.
"You just have to select "Nearline" as a bucket type (instead of Standard or DRA), so data can be reclassified as Nearline easily," says Kershaw.
This begs the question of whether we are likely to see auto-tiering of cloud data, and Kershaw expects that we will - eventually. "That's certainly where the industry will go over time. We will get to the point where we will recommend or automate classification based on age and so on. So the customer will be able to choose to let the cloud do the work and auto-optimize their data."
That still leaves offline storage as a fourth cloud storage tier option, but it seems likely that Amazon will have to cut the price of its Glacier storage service – perhaps substantially below the current 1c per GB mark – if it is to remain attractive. If it doesn't then Nearline storage will eat offline storage's lunch. There's no official word from Amazon on Glacier price cuts in reaction to the announcement of Nearline, but the company has cut its storage prices many times in the past, so further reductions wouldn't be out of the ordinary.
What will offline storage then be used for? "I think that offline storage will end up only being used to store regulatory stuff with little utility that you never access," says Kershaw. "After all, what's the point in saving stuff if it is almost impossible to use?"
In fact offline storage services probably have plenty of legs left in them even if prices aren't cut – if only for the fact that moving data out of them can be inconvenient, and there are charges associated with doing so too.
In enterprise data centers there are a number of storage tiers that administrators can take advantage of to minimize storage costs and match latency and bandwidth characteristics to what's required. These include internal server memory, flash caches and storage devices, various hard drive based systems, and tape storage setups. And there's no reason why cloud storage services should be restricted to just three of four tiers, Kershaw concludes.
"You could probably offer an infinite number of storage products," he says. "We want to keep it easy for now, but over time we could make additional knobs available and allow customers to choose additional options like higher throughput – allowing for the fast recovery of large volumes. Those are the kinds of things we may look at in the future."
Photo courtesy of Shutterstock.