by Jeff Kato, Senior Analyst, The Taneja Group
Let’s face it: Storage is dumb today. Mostly it is a dumping ground for data. As we produce more and more data we simply buy more and more storage and fill it up. We don’t know who is using what storage at a given point on time, which applications are hogging storage or have gone rogue, what and how much sensitive information is stored, moved or accessed, and by whom, and so on. Basically, we are blind to whatever is happening inside that storage array.
Am I exaggerating? Of course, I am, but only to a degree.
Can we extract information from the storage array today? Yes, we can. But one has to use a myriad of tools from a variety of vendors and do a lot of heavy lifting to get some meaningful information out of storage. The information is buried deep inside and some external application has to work hard to expose it. This activity is generally so cumbersome that most users simply don’t use it, unless it is required by law. In such cases (compliance or governance, for instance), external software is used to pull relevant information at great expense and time.
Of course, over the past decade, technologies such as auto-tiering have helped in moving less active data to lower-cost storage, and one may even find software that automatically deletes files, when their retention period has expired. But these are all one-off solutions, and the basic premise still stands: storage today is basically dumb.
What if storage were aware of the data it stored? What if all data were catalogued upon creation, indexed and analyzed? What if analytics were built-in and real-time? What if storage were aware of all activity taking place inside? What if data protection were an inherent part of storage and there was no need for media servers and tapes and separate disk systems? What if search and discovery were an integral part of the array?
Wouldn’t smart storage like this be a paradigm shift? Wouldn’t it fundamentally change how we manage, protect and use storage?
Of course, it would.
Welcome to the new era of data-aware storage.
The Need for Data-Aware Storage
This advance could not have come at a better time. Storage growth, as we all know, is out of control. Granted, the cost per gigabyte keeps falling at about 40 percent per year, but we keep growing capacity at about a 60 percent growth rate. This causes both the cost and capacity to keep increasing every year.
While the cost increase is certainly an issue, the bigger issue is manageability. And not knowing what we have buried in those mounds of data, if anything, is an even bigger issue. Instead of data being an asset, it is a dead weight that keeps getting heavier. If we don't do something about it we will simply be overwhelmed, if we are not already.
Why is it possible to develop data-aware storage today when we couldn’t yesterday? Flash technology, virtualization and the availability of "free" CPU cycles make it possible for us to build storage today that can do a lot of heavy lifting from the inside. While this was possible in the past, if implemented, it would have slowed down the performance of primary storage to a point where it would have been useless. But today we can build in a lot of intelligence without impacting performance or quality of service. We call this new type of storage data-aware storage.
When implemented correctly, data-aware storage can provide insights that were not possible yesterday. It can reduce risk for non-compliance and improve governance. It can automate many of the storage management processes that are manual today. It can provide insights into how well the storage is being utilized. It can identify if a dangerous situation were about to occur, either for compliance or capacity or performance or SLA.
In this article we will define the attributes of data-aware storage, examine the business benefits of deploying these systems and provide an industry landscape of up-and-coming storage companies that are introducing these pioneering products.
Data-Aware Storage Defined
All storage systems are getting smarter with each new generation, but to be categorized as data-aware storage, Taneja Group believes they must meet most, if not all, the criteria described below:
- Increased Awareness: The storage understand more about the content or attributes of the data stored on the device than traditional storage devices do. Examples include enhanced metadata about quality of service, file attributes and application-aware metrics, as well as actually scanning the data real-time looking for contextual patterns or keywords for security and regulatory compliance.
- Real-Time Analytics: It is not enough for these storage systems to gather enhanced metadata without making it useful in real-time. Therefore these systems must provide instantaneous updates of the enhanced analytics such that administrators or policy engines can react before issues become critical. An example would be the detection and suppression of a rogue application before it can sap IOPS from a more important application Another example would be understanding who is accessing which files and their relationship to others accessing the same files; this would help a business understand which types of data are more important and to which groups of people.
- Advanced Data Services: In addition, the storage system should have additional data services that enable better business outcomes based on the increased awareness. Examples would be the availability of archiving functions for dormant data, bursting the application to cloud once a threshold has been met, or balancing QoS across different application workloads. Other examples could include triggering compliance workflows or alerts or even built-in intelligent data protection.
- Open and Accessible APIs: In order for this new category of storage to flourish all the capabilities of these new systems must be open and available to enable a rich ecosystem of integrated applications and tools to come alongside and complement the data-aware storage. There are far too many vertical application requirements that could take advantage of unique data-aware features such that no one company could provide it all. Over time, de facto industry standard APIs will emerge for the most popular enhanced capabilities, similar to how the Amazon S3 data protocol became a standard.