EMC targets 'content addressable storage'

By Heidi Biggar

EMC last month rattled storage markets with the announcement of Centera—a purpose-built online storage system for fixed content—a partner list of more than 30 vendors, and a new division devoted to Centera development.

Based largely on technology acquired through its April 2001 acquisition of Belgium-based software developer FilePool, Centera gives end users a new way of storing, managing, and retrieving growing stockpiles of fixed-content data (see "Centera up close," on p. 20).

It is the first system designed from the ground up to support fixed content—data that, for economic reasons, has historically been kept on offline storage (e.g., tape or optical), claims Joe Tucci, EMC president and CEO.

EMC loosely defines fixed content, otherwise known as reference information, as objects that are unchanging and long lasting (e.g., electronic documents, digital X-rays, check images, movies, e-mail, and broadcast content). This compares to "changing data," which is updated frequently and has a life span of hours or days.

According to a recent study conducted by Enterprise Storage Group (ESG), a consulting firm in Milford, MA, 51% of new corporate and government information in 2004 will be reference information. ESG estimates that the market opportunity this year will be about 100PB, with 2,800PB possible by 2006.

Competitors were quick to counter EMC's claims. For example, Sun Microsystems says its StorEdge SAM-FS software is an alternative to Centera for Sun environments. "Archiving of fixed-content objects isn't a new storage requirement," says Paul Giroux, Sun's senior director of global network sales. Sun acquired SAM-FS (Storage Archive Manager File System) from LSC Inc. in February 2001.

Click here to enlarge image

According to Sun's Giroux, the SAM-FS file system features built-in archival capabilities, which gives users flexibility over where they store the data (e.g., on disk, tape, or optical), without requiring applications to be re-written for a new platform. On the downside, response time for SAM-FS is slow if the fixed content has been archived to tape.

As for Centera's impact on tape and optical markets, at $0.02 per megabyte, Centera is priced to compete against both technologies. The challenge for EMC, says Randy Kerns, a partner at The Evaluator Group research firm, in Greenwood Village, CO, will be getting users to switch to Centera, particularly those in the records retention area who are comfortable with their tape and/or optical technologies.

"That market is already being serviced by a small group of resellers and tape/optical vendors," explains Kerns. "Introducing Centera means new processes for users. It will change the way they do business."

Kerns says users should also consider whether Centera, which is a disk-based technology, meets the legal retention requirements for the records retention market and others. "Tape has a shelf life of about 70 years; optical, 50 years; disk, years to minutes," he says.

In a poll of InfoStor readers, 46% of the respondents said they expect Centera to negatively affect sales of tape and optical libraries, while 54% said they did not. EMC says its primary target isn't the fixed content that currently resides on tape or optical libraries but, rather, fixed content that isn't yet online (e.g., that's still on paper).

"We'll leave it up to users to decide how much of their current data they will migrate from tape or optical to Centera," explains Roy Sanford, vice president of marketing and alliances in EMC's Centera business group. "[But over time], Centera will bleed tape of this type of data."

Which data should you migrate to Centera? "If you're leveraging the historical value of the fixed content, and that is your only business objective, you should migrate slowly," says Sanford. Migrate faster if you can improve service levels or generate new revenue, he says.

Other considerations include the amount of fixed content an end user needs to bring online and how fast they expect their fixed-content storage requirements to grow. As a general rule of thumb, EMC says that Centera is targeted at end users with 5TB or more of fixed content to bring online and who expect their fixed content to grow by at least 50% over the coming year.

Centera up close
The Robert Frances Group, a market research firm in Westport, CT, describes Centera as "the first storage solution that adequately addresses the specific requirements of storing large amounts of fixed content online." EMC refers to the new product category as content addressable storage (CAS).

Click here to enlarge image

Centera is a bundled hardware-software system with off-the-shelf hardware (e.g., 1U 850MHz Pentium servers, 160GB ATA/IDE drives) and specialized content-addressing software called CentraStar. The servers, or nodes, are arranged in a "redundant array of independent node" (RAIN) configuration.

CentraStar is a content-based addressing scheme for storing, managing, and retrieving fixed content. The software creates a unique digital "fingerprint" for each data object. This compares to traditional addressing schemes that track information based on physical location. EMC claims that content addressing not only drastically simplifies the management of huge number of data objects (by five to 10 times), but also ensures the authenticity of data and eliminates duplication of identical objects.

Fully configured, each 19U Centera cabinet supports 32 nodes for up to 10TB of capacity. By clustering cabinets together, users can increase capacity to 160TB. The cabinets can also be clustered into domains (across multiple locations via WANs) for more than 1PB of mirrored capacity.

Centera is fully redundant (data objects are mirrored across nodes), self-configuring (the system automatically reconfigures itself as nodes are added and load balances capacity across active nodes), and self-diagnosing/self-healing (in the event of a disk-drive failure). Centera has an IP, not Fibre Channel, interface and is designed to be a stand-alone unit outside of a SAN environment. Migration of data from other storage technologies (e.g., disk, tape, or optical) is done over the IP connection.

A 16-node system lists for $210,000, including hardware and software. Centera is available through EMC and its partners (see vendor list). The system supports APIs for applications running Windows 2000/NT, Solaris, and Linux. Support for HP-UX and IBM-AIX is expected later this summer.

This article was originally published on June 01, 2002