Big Data storage creates its own growing challenges – and they’re only going to get worse.
It’s Hurricane Season right now in North America – and those storage professionals that seem to think they have weathered the big data storage storm, better watch out. Courtesy of unstructured data storage technologies such as Hadoop, they are beginning to get comfortable in the face of rampant data growth year after year. They ain’t seen nothing yet. Every facet of the storage world – on-prem, private cloud and public cloud is about to be assailed by a data hurricane that will make the last few years seem like a gentle breeze.
“While big data and the Internet of Things (IoT) comprise a tiny fraction of public cloud workloads today, both are growing rapidly,” said Bert Latamore, an analyst at Wikibon. “By 2020, these two domains will feature large in the growth and dynamics of the public cloud market.”
Here are some key tips to help you cope with the onslaught of big data.
1. Big Data Storage, Big Data Problems
One of the biggest challenges with big data storage is the many different types, faces and aspects of big data, said Greg Schulz, an analyst at StorageIO Group, some of it is big fast streaming data including videos, security surveillance while others is log, event and other telemetry, and then there are also large volumes of traditional unstructured files and objects. The common themes, of course, are that there is more data (e.g. volume) with some of that being larger (e.g. size) and that it is unstructured. Thus it is important to understand what type of big data you are dealing with in order to ensure it is addressed appropriately.
“Challenges include how to cope with and scale management without increasing cost and complexity, while at the same tie addressing performance, availability, capacity and economics concerns,” said Schulz. “What this means is rethinking how and where the data gets stored, which also ties to where the applications are located (on premise or cloud) along with how it is accessed (block, file, object).”
2. Application Location
In the old days you could get away with centralizing all data and having applications feed off it. But that approach tends to introduce too many bottlenecks.
“Put the data close to where the applications using it are located; if those applications are in the cloud, then put the data in the cloud and vice versa if local,” said Schulz. “The key is to understand the applications, where they are located, how they use data and then aligning the various technologies to their needs. Also, understand if your applications need object and which API for access, or, if they function with scale-out NAS.”
For example, some apps might be best using HDFS or another other file sharing platform, while others should gravitate to Amazon S3, Swift or other form of object storage. Also keep in mind how you will store and manage metadata to support big data applications, he added.
3. Bifurcated Storage Strategy
451 Research analyst Simon Robinson suggests a future where fast data storage requirements are managed at a flash tier (performance) and everything else scales out into cost-optimized tiers supported by object storage (capacity). There are a variety of storage tiering scenarios that can map to specific enterprise requirements. The key to this is seamless, automated movement between the tiers such that the end-user does not even know the tiering is going on.
4. Think Big Enough For Big Data
When it comes to effectively managing growing volumes of big data, it’s important to take the time to develop a strategy that not only meets your near-term needs, but can scale to support you effectively over time. Otherwise, you end up with software and hardware components that have reached a point where they no longer effectively scale. Therefore, check carefully into how well technology will scale before buying. In a big data world, it better scale enough to deal with the huge influx of storage.
“You can tell when existing software and hardware components have reached a point where they no longer effectively scale: when each additional storage volume added seems to take increasingly more time to manage and when the result of adding it does not seem to add the expected volume and performance,” said Michael King, Senior Director of Marketing Strategy and Operations, DataDirect Networks (DDN).
5. Categorize Meta Data
Categorizing data is wise as that enables you to know what it is and to search the meta data to find it. Long file names may have worked in the past, but not anymore due to growth rates as high as 100 percent year over year.
“Categorizing data is one of the best approaches for dealing with exponential data growth,” said Matt Starr, CTO, Spectra Logic. “Collect meta data at the time of creation, and store at least two copies on different media such as one on tape and one on disk.”
6. Decouple Capacity And Compute
Another tip is to build scale-out storage that decouples capacity from compute. As data becomes larger and larger, it is crucial to build an IT infrastructure that is scalable and fits well to the actual needs, without over provisioning resources.
“A way to accomplish this is to invest in storage infrastructures that can scale capacity and compute independently,” said Shachar Fienblit, Chief Technology Officer, Kaminario.
A storage solution for big data should support multiple protocols and simplify the way data is processed. Real-time analytics makes storage workloads less and less predictable. This is why flash storage is the favorite storage media to store and process big data workloads. Since the cost of flash media declines at a very fast rate, the industry will see more and more big data workloads running on all flash arrays.
7. Commodity Hardware
Scale-out object storage is one of the most effective ways to deal with these issues because data is continuously protected without the need for backups. But how do you keep the hardware costs down?
“Running on commodity X86 servers, object storage allows you to upgrade hardware seamlessly, as these devices function as modular units that can be aggregated without diminishing efficiency,” said Tony Barbagallo, Vice President of Product, Caringo.