Sage Weil started the Ceph filesystem as a research project. Today that research project is a bona fide enterprise option for Big Data and cloud storage purposes.
Ceph is now in the Linux kernel and Weil is the CEO of startup Inktank, which provides commercial support for Ceph.
In an in-depth interview with InfoStor, Weil detailed where Ceph is today and where it’s headed in the future.
“For a long time, Ceph existed as a research project where there were a few engineers working on it, but there was no organization behind it,” Weil said. “It’s only recently that we’ve been able to bring the technology to the point where we can use it in production.”
Weil explained that Ceph is not a replacement for a general purpose filesystem like Ext4, which is the default filesystem used in Linux. He added that Ceph is a clustered distributed system that runs on top of Ext4.
“That’s both storing virtual disk block devices for virtual machine hosting that is shared and replicated and reliable,” Weil said.
“RADOS is the underlying object store that everything is built on top off,” Weil said.
The world of open source clustered distributed filesystems is a competitive one. Ceph is often compared to GlusterFS, an open source storage filesystem that is now managed by Red Hat. Ceph can also be used to complement the Hadoop filesystem (HDFS) for Big Data deployments.
Ceph vs. Inktank
Even though Inktank is the lead commercial sponsor behind Ceph, Weil stressed that Ceph remains an open source project.
“The open source project has distributed copyright,” Weil said. “The code is not actually owned by Inktank, we just happen to have most of the developers working for us.”
The goal is to build a vibrant developer community that is not just funded by Inktank. Inktank as a company will then compete against others on the basis of offering support and professional services.
“What that means for us is making sure our internal development processes are as open as possible,” Weil said. “We encourage all of our developers, even when they are in the same room, to interact over the (mailing) list, over IRC and share our roadmap discussion as much as possible.”
Ceph has been part of the mainline Linux kernel since the 2.6.34 release in May of 2010. In more recent Linux kernel releases, Weil said that most of the Ceph changes have been mostly about bug fixes.
There are, however, some new Ceph features that are expected to land in the Linux 3.7 kernel, which will be out later this year. One of those features is a new RBD (RADOS Block Device) layer that will enable users to clone a device.
“The main thing it means is that you will be able to create a new virtual machine in your cloud environment that boots up instantly,” Weil said. “Currently most cloud stacks will instantiate an image file on the local disk and then they actually have to download or copy to the local disk, which is a slow operation.”
Weil added that the new RBD feature will also enable cloud administrators to migrate the storage from one Ceph pool of storage to another.
The biggest challenge for Weil now is building out the engineering team for Ceph both inside and outside of Inktank.
Weil joked that a year from now he sees Ceph dominating the world. On a more serious note, he does have high hopes for his one time research project.
“I would expect to see Ceph underpinning a range of public and private cloud deployments,” Weil said. “I expect to see it in PoC (Proof of Concept) stages for filesystem deployments as well.”