Data Storage Futures: Do We Really Need to Store Everything?

Posted on September 15, 2015 By Drew Robb


In the previous stories in our “Storage Futures” series, we looked at the growing volume of data in the world and how various technologies such as flash, disk and tape were adapting to meet the demand for storage. In this article, we delve into the management of that data and how to make it more efficient.

This, it turns out, is a vast topic that will likely required at least another one or two pieces to cover adequately. Let’s begin by digging in to the subject of storage longevity and answering the question, do we really need to store everything? This is particularly relevant in the growing field of the Internet of Things (IoT).

Longevity of data, it turns out, is a pivotal point to address in the ever expanding digital universe. With IDC predicting 44 ZB will exist by 2020, where on earth are we going to store it all?

The good news is that a whole lot of machine and sensor data produced within the realm of IoT probably doesn’t need to stick around forever.

Matt Starr, CTO at Spectra Logic, challenges people to ask the question, “How long is data really valid?” Do you need to hold onto all the data from a sensor in a Boeing 777 forever? Remember that there are thousands of sensors per engine churning out signals and recordings every few milliseconds. Maybe this does all need to be captured for the lifespan of that plane. Or perhaps certain parameters are retained after a single flight is completed, and the rest summarized and discarded. It all boils down to being able to set a value to each piece of information. That, in turn, determines where you store it, for how long you store in and how much it should cost to store it.

“Data should be stored in the right layer or area for the proper retrieval per dollar,” said Starr.

What tends to happen, he said, is that value is rarely assigned so all data is automatically given a high priority — and that leads to storage bloat and soaring costs. For some years now, tiering setups have instituted basic management schemes based on the recentness of the data and its rate of access. In addition, overly simplistic policies on data retention lead to everything being jettisoned after a certain number of years. This may seem fine to a lawyer trying to comply with regulatory demands, but it is much too simplistic and limited in the face of IoT and the specter of 44 ZB.

If a piece of digital information is only ever looked at once, surely its value is much less than that of other data. Yet most information is captured, stored and largely ignored.

“More than 90 percent of data is not looked at more than once,” said Vernon Turner, an analyst at IDC.

This very much applies to IoT. With sensors and devices churning out data points by the billion, it is essential to prioritize data and carefully prescribe which data needs to be transmitted to a central repository for analysis, which data can be stored and analyzed at the edge of the network, and which data should be discarded and when.

“Not everything can be centralized so it’s going to take a balance of core and edge storage,” said Greg Schulz, an analyst with StorageIO Group. “The last thing you want to do is be shipping PBs of data around; instead, move the application close to where the data is or to where a cached copy of the data is available.”

He sees the need to change where the data gets ingested, initially processed and stored (either locally temporarily and discarded or summarized and transmitted centrally). Smart content distribution networking and other techniques will eventually evolve a hybrid core-edge environment that can be set up where data gets ingested and protected (close to where it is created), then the data trickles back to some location perhaps a cloud, for subsequent distribution and cached access by others.

Some data, therefore, may never make it to a central storage repository for long-term retention. There may be hundreds or even thousands of data points involved, for example, in monitoring traffic on a highway. But it is likely that much of that data needs to be retained locally only for a relatively short period before being ditched. Only a summary would have real value beyond solving the immediate problem of how to set the lights to maximize traffic flow along the road.

Tony Cosentino, an analyst with Ventana Research, said part of the problem might be that there is confusion in the marketplace about what people are trying to achieve from IoT. In his view, it is of more value in immediate operational intelligence than in mid-range and long-term strategic intelligence.

In other words, the highest value to extract from IoT data may only be accomplished by analyzing it shortly after its creation. If you are gathering data to optimize the flight path of a plane, you really don’t need most of that information for very long. Perhaps everything is stored onboard for the duration of the flight then summarized on landing with only the most vital metrics retained long term. That could reduce the storage burden by several orders of magnitude.

“As IoT is real-time or near-real-time, it is of most value in making immediate or same day decisions,” said Cosentino. “The faster you can integrate data in the data store and analyze it, the more successful you will be with IoT.”

Photo courtesy of Shutterstock.

Comment and Contribute
(Maximum characters: 1200). You have
characters left.