Software defined storage (SDS) is a tricky thing to tie down: there’s no strict definition, but you know it when you see it. It’s offered in one form or another by major storage vendors like EMC, IBM, NetApp and HP, as well as more recently founded storage companies like Nexenta, Nutanix, and SimpliVity.
But the SDS landscape moves at breakneck speed, and these newcomers are now part of the SDS establishment, while a new generation of SDS startups such as Hedvig, Springpath, Maxta and SwiftStack are beginning to mature and get to ready offer some serious competition. They are arriving on the scene just in time to cache in on the expected growth in SDS usage: a 451 Research report last year found that 96 per cent of companies surveyed were “somewhat or very likely” to adopt SDS in the future.
Software defined storage is a complex business, so it’s not surprising that a good number of these startups have been founded by some of the smartest minds in the business – people who have already gained years of experience working for some big name technology companies.
A good example is Springpath, a California-based software defined storage company founded in 2012 by Mallik Mahalingam and Krishna Yadappanavar that came out of stealth mode in February. Mahalingam is one of the fathers of software defined networking, leading networking and storage product development at VMware for almost a decade, while Yadappanavar was a senior engineer at VMware who led the work in the company’s Virtual Machine File System (VMFS) clustered file system and vFlash, the flash virtualization framework.
The challenge with software defined storage is taking this formidably complex product and turning it in to something that simplifies storage operations, according to Ashish Gupta, Springpath’s marketing chief. “With many vendor’s storage solutions today there is a management challenge: you need to train staff to use the product, and there is never enough time to get them trained adequately,” he says. “That means the equipment is often not used properly of fully. It was the same in the early 2000s with VMware – it was good but the simplicity wasn’t there.
By contrast, Springpath and its Hardware Agnostic Log-Structured Objects (HALO) architecture is almost transparent, organizing storage automatically. It’s designed to work with a cluster of servers that may be running applications in virtual machines, containers or on bare metal, using the memory, flash and disk resources of the servers in that cluster as a shared resource. The servers themselves need to appear on a hardware compatibility list that currently includes vendors such as Dell, Cisco and SuperMicro.
The first release is actually designed for VMware/vCenter environments, and is basically a virtual appliance that creates a shared data store. “You use it to provision data stores from vCenter, so you don’t have to be a Springpath expert, just a VMware expert or a vCenter expert,” says Gupta. (Support for other hypervisors such as Hyper-V and KVM will be added in future releases.)
The company says the software uniformly stripes data from all the applications across all the servers in the cluster, helping to make use of all the storage resources available. Gupta adds that an organization could, for example, build a cluster with high performance blades and high capacity storage nodes. “If your application needs performance it can run in the blade environment. If it needs high storage capacity it can use the node environment.”
The Springpath software offers applications file, block and object storage access to the underlying storage resources, and also includes a Hadoop plug-in for big data environments. It provides both read and write caching in SSDs, maintaining hot data sets that are frequently or recently read in SSD and DRAM caches, while destaging cooling data to spinning media.
The software also provides the sorts of enterprise data services that you might expect from a big name vendor storage system, including snapshots and clones, as well as in-line deduplication and variable block inline compressions. De-duping works across all the storage media that the software can see: flash, hard disk and memory.
What’s unusual about Springpath is that you can’t buy a license: its solution is available on subscription for $4,000 per year, paid for on a per server node (not capacity or processor core) basis, with a minimum three node cluster size.
Hedvig is another software defined storage company that was founded in 2012 and that has appeared on the scene recently. (Apparently Hedvig stands for Hyperscale Elastic Distributed Virtual Integral Granular, a collection of words that describe the company’s software defined storage product, in case you are interested.)
Like Springpath, it has a high profile industry veteran as a founder. In Hedvig’s case the person in question is Avinash Lakshman, and if his name sounds familiar it’s because he’s the man behind Apache Cassandra, the database management system originally developed at Facebook, and Amazon’s Dynamo storage system.
The software itself runs on any standard x86 or ARM powered servers (with a minimum of two servers) and, like Springpath, allows users to consume storage resources as block storage, file based storage, object storage or storage for Hadoop. But unlike Springpath it supports all hypervisors, not just VMware’s, with a storage proxy running on each host. (The proxy can also run in a Docker container.)
Every feature of the Hedvig storage software is also available to cloud and application developers via a REST-based application programming interface (API).
Like Springpath, Hedvig software is designed to be simple to use. “When you add servers the software finds and spreads to them automatically,” explains Rob Whiteley, Hedvig’s marketing VP. The servers may have flash or disk, and Hedvig knows their performance and leverages that for storing hot or cold data. “The software will find the fastest nodes and auto-balance ad auto-tier into flash if the performance demands require that,” he adds.
Hedvig’s storage platform works with the notion of virtual disks, which users or administrators provision for use by applications. Each virtual disk that is provisioned can be configured for replication, compression and other services, and they can also be provisioned to be flash disks if flash performance is required. “You are almost creating flash in software – although the quality of the underlying hardware determines the performance,” Whiteley says.
He believes that at the most basic level, flexible software defined storage can give companies a 20% – 25% cost saving over traditional storage hardware solutions. “If you require a high performance subsystem then by the time you buy high performance servers it will be about that much less expensive, which for many companies may not be a big enough delta,” he says.
That’s until you factor in the fact that because the software is defined in software it is far more flexible, with provisioning taking seconds not days, he adds.
But the bigger savings come later, he believes. “The value is not in the initial deployment but in incremental changes. So if you implement Hedvig to run an Oracle database you may save 20% over a high-end storage array. But if you then need to deploy OpenStack, the cost is very compelling, because you wouldn’t want to put that on your nice high-end storage array.
“And if you then decide you need an archive for vast amounts of files, you would probably buy storage from a third vendor… so over time your investment (in software defined storage) gets better and better.”
What about Hadoop infrastructure? Applications like Hadoop can manage their own storage, but you run in to difficulties when you have two or more instances of Hadoop, Whiteley says. “If you do that you have two islands of locally attached storage, and they can’t see each other, so there is no data efficiency – you have multiple data lakes. Our customers say we want one virtual pool of storage, all deduped and compressed.”
In fact Whiteley believes providing storage for Hadoop environments will be one of the key use cases for Hedvig. (The others are storage for private clouds, and for highly virtualized multi-hypervisor environments where there is a need for a virtual SAN-type product.)
With a software defined storage solution like Hedvig (and particularly Springpath because you pay per node) it is far more practical to take advantage of advances in storage technology such as the new high capacity 8TB and 10 TB drives that are hitting the market, Whiteley points out. That’s because traditional hardware refresh cycles are around five years, and because RAID rebuild times on 8TB drives would be measured in weeks.
By contrast with a SDS solution nodes with these drives can be added easily without ever taking down the system. (But it’s worth noting that cost-wise these disks may not offer such a benefit to Hedvig users as to Springpath users because Hedvig’s software is sold on a perpetual license based on the storage capacity of the cluster as a whole.)
The release of VMware’s Virtual SAN product last year was a major step towards making software defined storage mainstream, but the technology still has some way to go before it is seen as mature enough for the traditionally ultra-conservative storage market. But Whiteley is hopeful that – like virtualization – full acceptance is only a matter of time.
“I don’t think it is an uncrossable chasm, he concludes. “Eventually even the most risk averse people will be making the leap.”