Software-defined flash provides a way to unleash the full performance of solid state drives (SSDs), while cutting the cost per megabyte of flash storage substantially.
This price and performance double whammy has already been achieved in the confines of data centers operated by Baidu, the Chinese web services giant. Technology developed in web-scale data centers in the cloud invariably trickles down to enterprise data centers, so it’s a good bet that software-defined flash will be available to medium and large businesses within the next few years.
Before we take a closer look at software-defined flash, let’s examine the problem with SSD storage. Although it offers a significant performance benefit over hard disk drives, it’s relatively expensive. And a significant part of the cost of SSDs comes down to the fact that they are not designed to work at maximum efficiency, explains Jim Handy, a solid state storage expert and semiconductor analyst at Objective Analysis. “SSDs are not optimized for storage — they are designed to look as much like hard disk storage as possible, and that means that there are processes going on to hide the nature of SSD storage from the software that uses it,” he says.
The reason for that is that most software — especially legacy applications and operating systems — has been designed to work with spinning disks. It’s what applications expect, so SSDs have to emulate hard disks. In very basic terms, most software talks in the language of hard disk drive storage, and SSDs are expected to translate that language in to SSD language for themselves.
“The difficulty with legacy software is that it would take phenomenal efforts to upgrade it to take (full) advantage of flash,” says Handy, “so the approach has been to leave it well alone and try to use something that looks like a very fast hard drive.”
That means that while the flash controllers in SSDs are busy writing and reading data, they are also managing all the things that an application or operating systems doesn’t bother with.
What sorts of things? Things like wear leveling — ensuring that the cells of flash memory are all used equally so that some cells don’t get over used and burned out; and garbage collection — moving pages of valid data to new blocks while leaving invalid data so that the entire original block containing only invalid data can be erased and freed for reuse. (That’s necessary because although SSDs can write at a page level, they can only erase an entire block of multiple pages. So to delete a page of invalid data, the SSD needs to delete the entire block that contains that page.)
Garbage collection is connected to something else that’s necessary in an SSD: overprovisioning. Essentially an SSD with a given physical capacity has a lower usable capacity. (A typical 128GB consumer SSD may be offer 120GB of usable capacity (7% overprovisioning) while a 128GB enterprise SSD may be 28% overprovisioned, offering 100GB of usable space.)
The difference between physical and usable capacity is set aside as an area of empty blocks. These empty blocks can be used by the SSD’s controller to store pages of data which have been moved from other blocks that also contain invalid pages of data so that those invalid pages can be deleted. Overprovisioning also gives the controller more flexibility when it comes to wear levelling.
So the question is this: What if SSDs could be made speedier by taking out those functions of the controller that could be better performed by a host server (with a reconfigured operating system or application or both)?
Baidu’s software-defined flash system involves host software modification, but also modification to the SSDs themselves by reprogramming their field-programmable gate array(FPGA) -based controllers, according to a paper published by Baidu engineers and a researcher from Peking University and Wayne State University. This provides an interface so that the software can more directly access the SSD’s internal operations and interact with them in a way more suited to their performance characteristics.
What Baidu did was remove the DRAM buffer in its SSDs as well as remove from the controller the logic for garbage collection, static wear levelling and inter-chip parity coding. These SSD housekeeping functions were instead made the responsibility of the host software. It then exposed all 44 internal NAND channels to the system software.
Page 2
One more modification was made to the software running on the host: write units were required to be the same size as SSD erase blocks. This essentially gets rid of the need for wear levelling and overprovisioning altogether.
The result was that the host software, given direct access to the raw flash channels of the SSD, could organize its data and schedule its data access to better to realize the SSD’s raw performance potential. The metrics are impressive: I/O bandwidth increased by 300%, and the storage cost per gigabyte with overprovisioning removed was cut by 50%. The company believes that the system now delivers 95% of the raw flash bandwidth, and 99% of the raw flash capacity is now usable.
The big question then is when software-defined flash technology will filter down to enterprises who would like to benefit from lower costs and higher performance, but who are unlikely to be able to modify their own applications, operating systems and SSD firmware in the way that Baidu has done.
The good news is that the International Committee for Information Technology Standards’ (INCITS) T10 (SAS and SCSI) and T13 (ATA) technical committees and the NVM Express industry association are all moving towards a standards initiative for storage intelligence that will allow otherwise standard SSDs to offload control of functions to the host.
There’s also an open source specification under development called LightNVM that allows a host to manage data placement, garbage collection and parallelism on “open-channel SSDs.” These are devices that share responsibilities with the host in order to implement and maintain features that typical SSDs keep strictly in firmware. According to LightNVM, a number of open-channel SSDs are under development, as is a storage platform that supports LightNVM from a company currently in stealth mode called CNEX Labs.
It’s impossible to predict when practical, usable standards will emerge, but in the meantime established vendors aren’t sitting around doing nothing. For example OCZ’s Saber 1000 SSD storage system, announced in October 2015, controls, manages and coordinates garbage collection, wear levelling, log dumps and other background tasks on connected SSDs. And it can choose to send data to drives only when they are not occupied in housekeeping activities that negatively impact performance.
The reverse is also true: when the system knows that it is not about to send data to the drives, it can tell them to use the opportunity afforded by this lull in write activity to carry out any housekeeping tasks that need doing.
The system uses special SSDs that include what OCZ calls Host Managed SSD (HMS) technology.
“We are essentially removing all the performance hits (associated with carrying out background tasks while receiving data) from the entire pool’s operation, thereby increasing performance and performance consistency and the latency of the entire aggregated pool,” explains Grant Van Patten, OCZ’s product manager.
There’s no question that this type of approach is different from Baidu’s software-defined flash setup in that Baidu’s system does away with the need for garbage collection and wear levelling by altering its software to make it SSD-aware. With OCZ’s approach, these housekeeping activities still have to happen, but the storage system manages when SSDs carry them out in a performance-efficient fashion, rather than allowing each SSD to think for itself without regard for the performance implications of doing so.
The OCZ approach has the benefit that no modifications are required to the operating system or applications running on it.
The Baidu approach may well suit companies (like Baidu) which run a small number of applications at vast scale, and which have the internal resources to modify their applications and operating systems to make use of suitably programmed SSDs.
But for enterprise and even SMB usage, it’s more likely that something like OCZ’s approach will prevail. Storage software will control and optimize the behavior of SSDs that comply with some software-defined flash standard to increase their performance, while applications and operating systems will continue to believe that the storage they are addressing is made up of conventional spinning disks.