By Mike Matchett, Sr. Analyst and Consultant, Taneja Group
With the advent of big data and cloud-scale delivery, companies are racing to deploy cutting-edge services. “Extreme” applications like massive voice and image processing or complex financial analysis modeling that can push storage systems to their limits. Examples of some high visibility solutions include large-scale image pattern recognition applications and financial risk management based on high-speed decision-making.
These ground-breaking solutions, made up of very different activities but with similar data storage challenges, create incredible new lines of business representing significant revenue potential.
Every day here at Taneja Group we see more and more mainstream enterprises exploring similar “extreme service” opportunities. But when enterprise IT data centers take stock of what it is required to host and deliver these new services, it quickly becomes apparent that traditional clustered and even scale-out file systems—the kind that most enterprise data centers (or cloud providers) have racks and racks of—simply can’t handle the performance requirements.
There are already great enterprise storage solutions for applications that need either raw throughput, high capacity, parallel access, low latency or high availability—maybe even for two or three of those at a time. But when an “extreme” application needs all of those requirements at the same time, only supercomputing type storage in the form of parallel file systems provides a functional solution.
The problem is that most commercial enterprises simply can’t afford or risk basing a line of business on an expensive research project.
The good news is that some storage vendors have been industrializing former supercomputing storage technologies, hardening massively parallel file systems into commercially viable solutions. This opens the door for revolutionary services creation, enabling mainstream enterprise data centers to support the exploitation of new extreme applications.
High Performance Computing in the Enterprise Data Center
Organizations are creating increasingly more data every day, and that data growth challenges storage infrastructure that is already creaking and groaning under existing loads. On top of that, we are starting to see mainstream enterprises roll-out exciting heavy-duty applications as they compete to extract value out of all that new data, creating new forms of storage system “stress.” In production, these extreme applications can require systems that perform more like high-performance computing (HPC) research projects than like traditional business operations or user productivity solutions.
These new applications include “big data” analytics, sensor and signals processing, machine learning, genomics, social media trending and behavior modeling. Many of these have evolved around capabilities originally developed in supercomputing environments, but are now being exploited in more mainstream commercial solutions.
We have all heard about big data analytics and the commoditization of scale-out map-reduce style computing for data that can be processed in “embarrassingly parallel” ways, but there are now also extreme applications emerging that require high throughput shared data access. Examples of these include some especially interesting business opportunities in areas like image processing, video transcoding and financial risk analysis.
Finding Nemo on a Big Planet
A good extreme application example would be image pattern recognition at scale. Imagine the business opportunity in knowing where customers were located, what kind of buildings they lived in, how they related geographically to each other and/or how much energy they use. Some of the more prominent examples of image-based geographic applications we have heard about include prioritizing the marketing of green energy solutions, improving development and traffic planning, route optimization and retail/wholesale targeting.
For example, starting with detailed “overhead” imagery (of the kind you find on Google Maps' satellite view), it is now commercially possible to analyze that imagery computationally to identify buildings and estimate their shape, siting (facing), parking provisions, landscaping, envelope, roof construction and pitch, and construction details. That intelligence can be combined with publicly available data from utilities, records of assessments, occupancy, building permits and taxes, and then again with phone numbers, IP, mail and email addresses (and fanning out to any data those link to) in order to feed a “big data” analysis. At scale this entails processing hundreds of millions of imagery and data objects over multiple stages of high performance workflow.
A World of Devices Hungry for Content
As another example, the demand and use cases for rapid transcoding of video are growing every day thanks to the exploding creation and consumption of media on mobile devices. In today’s world of Internet-connected devices, each piece of video that is created gets converted via “transcoding” into potentially 20 or more different formats for consumption.
Transcoding starts with the highest resolution files and is usually done in parallel on a distributed set of servers. Performance is often paramount ,as many video applications are related to sports or news and have a very short time window of value. Competitive commercial transcoding solutions require fast storage solutions optimized for both rapid reads and massive writes.