Henry Newman's Storage Blog Archives for May 2013

The LP Is 65 Years Old

There was a great article on changes in music recording technology in The Register by Chris Mellor. The LP is now 65 years old—only about 11 years older than me, which is pretty scary for me.

Anyway, LPs are few and far between now, with even CDs going the way of the dodo bird in favor of digital music.

That got me thinking again about the longevity of all interfaces. The needles used for the LPs are not much different than any interface.

When I first started, interfaces to disk drives where built by the vendor. There was not much of a standard, and the same was true for tape. Fast forward 30+ years, and we have standard interfaces today. SAS, SATA and fibre channel (clearly this is on the way out for disk interfaces except for tape). Most RAID controller interfaces are SAS or fibre channel, and some tape drives support SAS.

Most people do not keep old disk drives around because they are not cost effective to run given the cost of support along with other reasons like power and cooling. But tapes are a different story—at least in open systems.

On the IBM mainframe, the standard is FICON. Generation after generation, it works and is supported for the right amount of $$. Open systems, go try and find a support FC 1 Gb, or 2 Gb and soon even 4 Gb interface that is officially supported.

Now you might correctly state that things are backward-compatible, and you would be correct. But in the world of archive, in my opinion, that is not good enough. You need to have real support.

I remember long ago when I had a cassettes in my car that I would take my LPs and copy them so I could listen to music in the car. But that was copies, not migration in the real sense, as I was losing resolution.

Long term, we need to have open systems behave not like LPs over the last 65 years but more like IBM mainframes to reduce the cost of migration to new media for large archives. Or maybe we need to move to IBM mainframes for large archives. A thought.

Labels: disk drives, SAS, SATA, fibre channel, tape, archive, interface, Storage

posted by: Henry Newman

Parallel File System vs. REST/SOAP

I have gotten a lot of feedback from people in the parallel file system community my article on cloud computing's impact on file systems. All of the feedback has been positive, but a number of people pointed out that I should have also discussed file systems that support a single namespace but not parallel I/O to a single file. I like to call these "shared file systems."

It is likely easier for these file systems to deal with the POSIX atomicity issues, but they still face many of the same problems with POSIX and file system metadata requirements as file systems that support parallel I/O do.

The vendor community that controls the standard does not want to add anything, so what are we to do as?

The other option besides adding things is to relax the requirements. Since adding new features seems to be out of the question, how about relaxing the current requirements?

Maybe we could have a mount option that doesn't have the same metadata atomicity requirements. File systems that are NFS mounted have in the past had different metadata atomicity requirements than the server file system. Maybe we should use this model as an option for shared and parallel file systems.

Clearly, this is not up to me but up to those people and companies that have paid to be part of the OpenGroup and part of the standards process. If these people are reading this blog entry, please consider this. Without some changes, I fear that the REST/SOAP interface might make the POSIX file system go the way of the dodo bird, which went extinct because of human decisions.

Labels: REST, Cloud, SOAP, standards, metadata, POSIX, Parallel File System

posted by: Henry Newman

Big Data and HSM

It was not long ago that vendor after vendor declared hierarchical storage management (HSM) dead. They said power-managed drives and disk density would solve all the world’s problems.

Even with those technologies, we are still taking about storage tiers, which are really no different than HSM—except that HSM is associated with tape.

Now that we need to store untold numbers of exabytes of data—which costs significantly more with disk than tape—will HSM make a comeback?

I think the answer is yes, and the reason is simple. We do not know what we do not know about the data, and in the future we will be able to find out more information.

It was not that long ago that the DNA between chromosomes was called junk DNA. We now know that was a completely false description, and we are learning new things about diseases and replication of DNA that we had no idea about. The same is true for all kinds of other information; we would be naïve to think anything different.

Data must be saved for future generations, because today we do not know how that data could be turned into information, whether that data is seismic traces in the search for oil, DNA, climate data or a myriad of other information.

The only way I know that we can actually save the raw information today is via and HSM and oil tape. There are those who talk about other storage types and have for years, but nothing has entered the market. For you young people out there, it might be time to learn about HSM, as it might be a good long-term career move.

Labels: Hierarchical Storage Management, HSM, big data, Storage

posted by: Henry Newman

DNA and Storage

Did anyone hear the NPR story on storing some of Shakespeare’s sonnets in DNA?

The two researchers are not the first to develop an encoding scheme and store things in DNA, but their approach was very interesting. The researchers, Ewan Birney and Nick Goldman, even addressed a concern that I had with DNA storage—mutation, which we call bit rot or silent data corruption.

So is it time to develop an agreed-upon encoding scheme for DNA storage, including an agreed-upon ECC encoding?

The cost of this DNA storage was extremely high at $12,400 per MB, so the cost is going to have to come way down before anyone gets spun up doing this. It is not clear from the article if the cost includes the ECC or just the raw data, but a terabyte of storage would cost $12,400,000,000. Even if the price comes down at 50 percent per year, we are still talking the next decade before this becomes affordable.

The other issues that need to be considered include the following:

  1. What is the long-term shelf life and what are the shelf life conditions? There needs to be reliability testing to determine the amount of ECC that will be needed.
  2. The current read and write time is abysmal—weeks to write and hours to days to read. Nanotechnology must be developed for DNA storage to be successful.
  3. There needs to be a standards body. Will it be an ANSI IT group or will it be a genetics group?

A few years ago at a conference, I though the idea of DNA storage was absurd. I said to someone, "You have red hair and blue eyes, and genes mutate, which is why we do not look alike." The idea of having ECC built into DNA storage had not crossed my mind.

I think it is time to start thinking about some long-term reliability studies to see if this could be part of the storage hierarchy.

Labels: standards, Storage, DNA

posted by: Henry Newman

Is SATA Going the Way of the Dodo Bird?

Everyone is talking about 12 Gbit SAS, which should appear later this year after having been talked about for quite a long time.

Over the last ten or so years, we had seen SATA taking over the second tier of the enterprise. This vastly increased the size and scope of the SATA market space.

But about a year ago, the major drive manufacturers began building these enterprise SATA drives with SAS and SATA interfaces. By using SAS, you get a richer command interface and better error recovery on the drive, as well as a number of other features not available in the SATA command set. Performance is also better for most vendors’ drives using the SAS interface.

Now SATA is still used for the home and business PC market, but that market is not growing and is actually shrinking. So that got me wondering what the interfaces will be for future disk drives.

The SATA organization just released their roadmap for increased performance, and it falls far behind SAS, only supporting 8 Gbit interface rather than 12 Gbit. So what is going to happen?

There are still some SATA-only vendors out there, a prominent example being Intel with their SATA line of SSDs. What will WD, Toshiba and Seagate do about SAS vs. SATA?

The latest Intel processors support SAS on the socket for the server processors, but after I just did a quick check of some higher-end motherboards, I saw none that support SAS on the motherboard.

I suspect that sometime this year we will find out what the disk drive makers decide. I for one would be happy to pay a few dollars more to get a more reliable, faster drive with a SAS interface.

Labels: disk drives, SAS, SATA, Storage

posted by: Henry Newman

Stranger Than Fiction: Hitachi Discusses Holographic Storage

I am currently at the IEEE Mass Storage conference and was shocked to hear Hitachi talking about holographic storage with availability in early 2015.

Yes, we have heard this before from companies, but never from a company that is not trying to raise money from investors. Hitachi is a very conservative company, and for them to announce this means, in my opinion, that they are very close to having a working production product.

During his talk, Hitachi's Ken Wood said that a 2 TB holographic drive will be able to read at 2 Gb/sec or 238 MiB/sec. Ken also discussed a 12 TB media option that would support 6 Gb/sec read or 715 MiB/sec read speed. Write speeds are far lower than read speeds as data is validated end-to-end before it is committed. The host side will validate that data transferred to the drive matches what was in the buffer before it was committed to the drive. This seems like a huge advantage over current methods and ensures end-to-end preservation of data.

Next came Akinobu Watanabe, who discussed the error rate, which has significantly improved over current optical technology to 10E21 bits, adding ECC to each block. This is far higher than current optical technology.

If the information provided is accurate, then I would consider this the first disruptive technology to hit the storage industry in a very long time. Needless to say, I am very impressed and am hopeful this technology does make it.

By the way, you can see the conference presentations at the IEEE site.

Labels: Hitachi, holographic storage

posted by: Henry Newman