New approaches to managing reference information

By Heidi Biggar

Although it's one of this year's leading buzzwords, "reference information" is a term that should not—and, in many cases, cannot—be ignored. In fact, some analysts say the sheer enormity of the impending reference information boom begs prompt end-user attention.

According to forecasts from the Enterprise Storage Group (ESG), reference information will surpass non-reference information in terms of total capacity next year and will grow at a 92% CAGR through 2006.

While EMC, Persist Technologies, and StorageTek were among the first vendors to offer products directly geared to reference information storage, the playing field broke loose in the second half of this year. Recent entrants include Permabit, whose Permeon software some see as a direct competitor to EMC's Centera, as well as Isilon and Panasas, which cross over into both reference and non-reference (or primary and secondary) storage segments.

However, analysts and vendors disagree over whether these two classes of products should be compared in a discussion of ways to store and manage reference information. Arun Taneja, founder of The Taneja Group, puts Permabit and EMC's Centera in the same competitive category, and Panasas and Isilon in a different category. On the other hand, Peter Gerr, a research analyst at ESG, contends that while the two groups of products are different, they should both be discussed in the context of reference information storage. Gerr defines reference information as "any digital asset retained for active reference and value."

Permabit COO Richard Vito says that the two product classes are entirely different. "Panasas and Isilon are considered 'tier-1' storage; Permabit [and EMC are] 'tier-2.' "

"It depends on how you define 'tier 1' and 'tier 2,' " counters Gerr. "Panasas and Isilon might be 'tier 1' in certain vertical markets or they could also serve as 'tier 2' for longer-term archival storage."

"Content-addressed storage [CAS] systems such as Permabit's Permeon and EMC's Centera are not designed to be high-performance primary [or 'tier-1'] storage systems that support lots of users requesting files at the same time," claims Sujal Patel, founder and CTO of Isilon.

Patel says Isilon's architecture is not optimized for long-term archival of files. However, he says it is conceivable for the two product types to run up against one another in the reference information market, specifically in instances where users are looking for "a storage product for the active archiving of frequently used digital files."

Both the Panasas and Isilon systems are network-attached storage-based architectures that support standard NAS protocols (NFS, CIFS, HTTP, and FTP) and serve files. However, they are different from traditional NAS systems in that they reportedly overcome some common NAS limitations.

For example, Isilon's IQ system eliminates "islands" of storage, removes performance bottlenecks, and creates a single namespace and single pool of storage. "In essence, it merges the simplicity and ease of use of NAS with the performance of a SAN [storage area network]," claims Patel.

The "secret sauce" of Isilon's IQ architecture is its OneFS distributed file system, which integrates three disparate layers (file system, volume manager, and RAID) into a single layer; directly controls the layout of data on disks; creates a single volume, single-shared namespace; and enables a variety of software capabilities, including automatic content balancing, real-time policy management, self-healing data integrity, intelligent algorithms, and intuitive Web management.

Each Isilon IQ cluster consists of three nodes (each 2U rack-mountable node provides 1.44TB of disk storage and 4GB of cache memory) and Gigabit Ethernet connectivity (see Figure 1). The software runs on Intel-based hardware and includes NDMP support for backup and restore.

Figure 1: Isilon IQ distributes content across a storage cluster, creating a single, shared pool of storage.
Click here to enlarge image


Isilon is focused on the digital content market; initial customers include Technicolor, Corbis, ResearchChannel, and Harborview Medical Center/UW Medical.

Panasas, meanwhile, is targeting Linux-based technical computing applications in life science, government, oil and gas, and media markets. Its customers include Los Alamos National Laboratory, GeoTrace Technologies, NuTec Energy, Sandia, and a number of university-based genomics research facilities.

Panasas' ActiveScale Storage architecture is based on a clustered, object-based file system (see Figure 2). The architecture turns files into data objects (comprising data, metadata, and other data attributes), which are spread across self-managing ActiveStorageBlades. Clusters of DirectorBlades orchestrate activity between clients and the StorageBlades and balance objects across the StorageBlades.

Figure 2: Panasas' ActiveScale Storage architecture is based on a clustered, object-based file system.
Click here to enlarge image


A Gigabit Ethernet switch provides parallel data paths (separate from the control path) to the Serial ATA-based StorageBlades, which are available in 160GB, 240GB, and 500GB capacities.

The entire cluster can be managed as a single system and connects to an Ethernet network. Protocol support includes NFS and CIFS.

Similarly, Permabit's Permeon software presents standard CIFS/NFS APIs—not a propriety interface like Centera's—to external applications such as e-mail archiving, document imaging, digital asset management, etc. Like EMC's Centera and Panasas' ActiveScale, Permeon is object-based.

By not writing to a proprietary interface, Permabit claims it has addressed some of the integration issues with Centera. However, unlike Centera, Permeon is a software play. It is designed to be layered on top of off-the-shelf hardware to create a content-addressed storage (CAS) system for applications such as e-mail archiving, document management, hierarchical storage management (HSM), and disk-based backup.

Figure 3: In Permabit's Permeon architecture, intelligence runs on portals, which translate records, blocks, or files into objects.
Click here to enlarge image


The Permeon "intelligence" runs inside the Permeon Portal, which essentially acts as a translator for incoming records, blocks, or files, and turns them into objects (see Figure 3).

This month, Permeon is expected to announce partnerships with CommVault, iLumin, and KVS (which are also EMC partners).

Other key features of the Permeon software include data coalescence, which eliminates unnecessary data redundancy; encryption and file- and volume-access control; fault tolerance (data is replicated on more than one server); capacity and performance scalability (to 40TB and 125MBps, respectively); snapshot-based data retention; data integrity checks; data offload and off-site replication; a scalable file system; and a variety of self-managing and self-healing capabilities.

This article was originally published on December 01, 2003