Now let's imagine that EP1 is simply a copy of P1 and EP2 is simply a copy of P2. So far so simple.
To generate two extra encoded pieces, EP3 is P1 + P2, and EP4 is P1 + (2xP2)
So what happens if two if these encoded pieces, EP2 and EP4 are lost?
We are left with EP1 and EP3, and we know that EP1 is identical to P1, and EP3 is simply P1 +P2. So with a little mathematical equation solving it is possible to get the original file back from just these two encoded pieces.
That's the principal. In fact erasure coding is more complex than that. A common form of erasure coding is called Reed-Solomon (RS) erasure coding, invented in 1960 at MIT Lincoln Laboratory by Irving S. Reed and Gustave Solomon. It uses linear algebra operations to generate extra encoded pieces, and can be configured in different ways so that a file is split in to k pieces, and then encoded to produce an extra m encoded pieces which are effectively parity pieces.
That means with RS (k,m) you can lose any m encoded pieces out of the total (k+m) encoded pieces, and still reassemble the original file.
Typical RS configurations are RS(6,3) and RS(10,4), meaning any 3 pieces of 9 for RS(6,3) and any 4 pieces for RS(6,4) can be lost without losing any data.
So let's take a look at the efficiency of these Reed Solomon EC systems compared to simple Hadoop-style replication and no replication at all:
Single copy of data: no failures can be tolerated, 100% storage efficiency
Triple replication: 2 failures can be tolerated, 33% storage efficiency
RS(6,3): 3 failures can be tolerated, 67% storage efficiency
RS(10,4): 4 failures can be tolerated, 71% storage efficiency
What's pretty clear then is that although triple replication sounds like it should be a very effective form of data protection – after all you have two extra copies of the data – it turns out that a system such as RS(10,4) may be far better. That's because it offers more protection – 4 failures rather than 3 – and because it is far more storage efficient, allowing you to store much more original data in any given storage resource.
The downside of EC is that processing data in to encoded pieces and reassembling them when the data is needed takes processing power, as mentioned earlier. It can also introduce an element of latency compared to reading a file off a single disk – especially when pieces are stored in geographically remote locations.
However, it is possible to mitigate against this latency problem to some extent, Sinclair says. "Some implementations of erasure coding use WAN optimization to speed up how bits are moved across the wire," he says. "Other solutions build a scheme so that if they suffer one or two drive failures they can rebuild across local servers, while they can still cope with a more general site failure, although recovering will be slower."
Companies such as Nutanix are introducing their own proprietary erasure coding systems rather than using Reed-Solomon, and these are likely to be optimized for their storage software and may include WAN optimization as well.
So where is erasure coding most likely to be used? Today it's most common in massive capacity storage environments with large objects, says Sinclair.
"In these massive active archives erasure coding is great, but as we continue to benefit from Moore's Law we will see more organizations look to commodity hardware and software defined storage," he says. "When that happens I think erasure coding will move in to transactional and virtual workloads."
One of the places where erasure coding is less likely to be adopted is in all-flash storage spaces in high performance, mission critical environments, Sinclair believes. "In terms of capacity, all-flash environments may be smaller environments where the challenges of RAID are not as severe," he says.
He adds that the differing failure rates of solid state media compared to spinning disks may also mean that erasure coding does not bring the same benefits, and also observes that the faster performance of flash storage may make the latency issues of geographically distributed erasure coding less important.
One final thought: the main drawback to replication is that it is very storage inefficient, but the falling cost of storage capacity could eventually make this a non-issue. To match the data durability of RS (10,4) which can tolerate four device failures, you would have to use four fold replication – but if the cost of storage becomes negligible then the benefit of erasure coding is less clear cut.
So could erasure coding be a technological dead end?
Sinclair thinks that's unlikely, and the reason comes back to Moore's Law: storage costs may be falling, but the cost of computing resources is falling too.
So if the cost of storage and compute resources both become negligible then there is no direct cost-based reason to favor replication over erasure coding or the other way round.
In that case then erasure coding may still end up being preferable – if only because it's more efficient in terms of basic requirements like power and rack space.
Photo courtesy of Shutterstock.