The Gluster open source distributed filesystem is out with version 3.3 after a long development cycle. Gluster 3.3 unifies file and object storage and provide a Hadoop HDFS compatible API for Big Data Storage.
GlusterFS 3.3 was originally expected to be released by the end of 2011. John Mark Walker, Community Leader for Gluster.org explained to InternetNews.com that Gluster 3.3 was an unusually long release, for several reasons, including the fact that the scope of the release was fairly large.
“Between HDFS compatibility, Unified File and Object storage, self-healing improvements, granular healing, quorum enforcement and other improvements, a lot went into this release,” Walker said. “We also retooled our community for open governance and transparency and were able to put more resources into QA and release management.”
Walker added that all those changes have improved the project in significant ways, but that may have delayed things for this particular release. Among the significant changes are unified File and Object storage. Walker explained that both object storage and file storage have their pros and cons.
“The file Interface — a low-level rich API, a la POSIX, allows the developer to do many things, but for some operations it is overkill,” Walker said. “Object storage does not give developers nearly the same level of functionality because it is, by design, a simplified data access interface.”
On the other hand, object storage allows you to access data via remote, high-latency environments, where a chatty, low-level file interface would perform poorly. In his view, the modern enterprise needs both file and object storage, which is what Gluster 3.3 now delivers. Having both also enables an easier migration to the cloud for enterprise applications.
“Enterprise developers and admins don’t want to rewrite their entire storage layer just to create cloud and mobile apps,” Walker said. “We want to allow them continue to use the file interface and let remote devices and web applications access the same data via the object interface.”
Scalability
Gluster 3.3 also provides improvements that make the system more resilient and scalable. Among those improvements are a series of self-healing additions including something known as granular healing. Walker explained that with granular healing, the self-heal process goes block by block on each file to be healed to determine which parts of the file to heal.
“Before this release, you risked service outages while your large, multi-GB VM image recovered from a failure,” Walker said. “In those bad old days, the entire VM would be locked while a self-heal took place. No more.”
The Gluster 3.3 release also has server-side and proactive healing. Walker explained that in proactive self-healing, when a file is modified, each server remembers the list of pending files for a recovered node. The recovered node queries other servers for what must be healed, and then copies over good files.
“Proactive self-heal is implemented by remembering pending operations via use of symlinks to the to-be-healed files. In a synchronous replication volume, each replicated server keeps a list of symlinks in case one or more nodes goes down. The healing process, once the recovered nodes establishes what needs to be healed, is actually quite interesting. The healing process is performed with what is effectively a ghost client, operating on the server, accessing the N+1 synchronously replicated server nodes.
“Previously, for self-heal of replicated volumes to work, there needed to be an actual GlusterFS client mounting the filesystem and kicking off the self-heal process,” Walker commented. “In order to achieve smarter, proactive self-healing, we created server-side self-healing that mimics the use of a GlusterFS client.”
Next Steps For Gluster
Now that Gluster 3.3 is generally available, the Gluster community is looking at a six-month release cycle. The release cycle will be accompanied by community developer/design events, with the first developer summit the week of July 9 in the Bay Area.
Over the course of the Gluster 3.3 development period, Red Hat acquired Gluster for $136 million. GlusterFS is now the cornerstone of the Red Hat Storage solution. Red Hat Storage 2.0 was last updated in April of this year.
As part Red Hat, Gluster as a community has also gone from an open core approach to an open source one.
While Red Hat Storage is based on the core open source Gluster project, it’s not entirely clear when the final Gluster 3.3 release will be reflected in a generally available product from Red Hat.
“We made a decision that Gluster would aim to be the Linux of distributed storage and become the standard for unstructured data,” Walker said. “To do that, we must always take into account the best interests of the community at Gluster.org.