Storage Enters The Age Of Erasure Coding

The inefficiencies of RAID and replication mean the time has finally come for erasure coding-based data protection

Erasure coding is a storage technology that’s about to explode on to the storage mainstream.

Its appeals are obvious: it’s a data protection system that’s more space efficient than straight replication, and one which tolerates more faults and allows you to recover lost data far more quickly than is possible with traditional RAID systems.

Here are just a few examples of storage offerings that are getting serious about the technology: Intel and Cloudera are developing erasure coding in HDFS for release in Hadoop 3.0, and Nutanix has begun showing off its own proprietary erasure coding called EC-X in the current versions of its Nutanix OS in preparation for its launch in NOS 5. Ceph, the open source software storage platform, introduced erasure coding last year with the Firefly (v0.80) release, and erasure coding is at the heart of Cleversafe’s dispersed storage systems. (Earlier this month IBM announced that it had acquired Cleversafe for an undisclosed sum.)

Erasure Coding: Why Now?

Erasure coding is not new – it’s been around for over 50 years – but one reason that the time for erasure coding may finally have come boils down to the fact that enterprises are accumulating and storing vast (and rapidly increasing) amounts of data every day. That means space efficiency is becoming more important.

A platform like Hadoop typically provides data protection through replication: three copies of each piece of data are stored on different cluster nodes. The problem is that a Hadoop system may be storing many terabytes of information, and that makes replication very expensive. That’s because this level of replication has a storage efficiency of just 33%: you can only use 33% of your storage capacity to store data – the other 67% is then used up by replicas of this data.

By contrast – as we shall see – some erasure coding schemes offer storage efficiency as high as 71% while offering an even greater level of data protection.

There are other factors driving the adoption of erasure coding too. One is that Moore’s Law has ensured that the processing overhead required to operate erasure coding is rapidly becoming insignificant as processing power becomes cheaper and more abundant.

And Scott Sinclair, a storage analyst at Enterprise Strategy Group, also identifies the trend toward software defined storage running on commodity hardware as another important driver.

“Custom storage hardware is more expensive but more resilient than standard hardware, which is not designed to have a single point of failure,” he says. “To cope with this some software defined storage solutions use replicas across nodes so that if one node goes down another can take over, but this is very inefficient. So they are taking advantage of the processor gains in standard server hardware to use erasure coding with software defined storage.”

RAID Problems

RAID systems are also designed to overcome the inefficiency of replication. But the vast amounts of data that enterprises are accumulating are increasingly being stored on very high capacity disks – in some cases 10TB drives – and this causes a number of different problems for RAID systems.

First, high capacity drives are more likely to suffer bit errors as there are more bits stored on them. When errors lead to a RAID rebuild there’s the problem of reduced or no data protection if another disk in the RAID array fails before the rebuild is complete. And another failure is more likely since the disks have such high capacity.

Second, it was never intended that RAID be used with such high capacity disks. Since capacities have grown far faster than data transfer rates to and from disks, this means that rebuild times can now take many hours and days.

Sinclair points out that RAID also offers far less storage flexibility than erasure coding. “With RAID 6, you take your disks and say ‘these disks are in RAID 6,'” he explains. “But with erasure coding you can be more flexible and say ‘this virtual pool has this protection model – you can abstract from the hardware.’

He adds that erasure coding also lets you to scale larger than the inefficiencies of RAID will allow. Replicas can do this too, but with replicas you need far more storage space.

How Erasure Coding Works

Erasure coding works by splitting a file in to a number of equally sized pieces, and then doing some fancy mathematics encoding to produce a larger number of pieces. For example, you could start with a single file, split it in to 6 pieces, and then do the encoding to produce 10 pieces.

What’s clever about the encoding is that you would only need 6 of the 10 encoded pieces to get back to the original file – you can lose any four and without resulting in any data loss.

To get an idea of how EC works, let’s look at a very simple example where you split a file into 2 pieces, and then encode those in to 4 encoded pieces.

So we start with a single file, split it into 2 pieces which we’ll call P1 and P2, and then encode those into 4 encoded pieces EP1, EP2, EP3 and EP4