By Ann Silverthorn
One of the inventors of content-addressed storage (CAS)-Paul Carpentier-is one of the founders of Caringo, a start-up with a software-based CAS product designed for fixed content. Caringo’s CAStor software, which was introduced this month, runs on industry standard hardware, including systems based on high-capacity Serial ATA (SATA) drives, which the company says could eliminate the need for multiple tiers of storage.
Carpentier sold his company, FilePool, to EMC in 2001. FilePool’s technology became part of EMC’s Centera CAS platform.
“We don’t believe CAS should be in the archiving corner of the data center, which is where EMC parked it,” says Carpentier, Caringo’s CTO. “CAS has its own place next to NAS and SAN in a much broader role for storage of fixed content, which is the fastest-growing sector of the market. Our goal is to have something that’s fast enough for primary storage and cheap enough to keep it there.”
Although the hardware may be inexpensive, the data has to have integrity, especially if it needs to be stored for long periods of time for regulatory compliance. The CAS technology that Carpentier developed uses hash algorithms as unique identifiers for objects. He admits that hash algorithms tend to break as computers get faster and algorithms that were once considered infallible become victims of attack. He cites the breaking of MD5 and SHA-1 hashes and believes that 256-bit and 512-bit varieties are also at risk.
CAStor upgrades hashes without compromising the associated content by keeping the identifier separate from the seal (the fingerprint that guarantees the content). The identifier remains constant, and audit logs ensure the integrity of the data can be verified in court.
The CAStor software installs from a bootable USB flash drive that can be plugged into any computer. Nodes can be added to form a cluster. Other options include a network-boot capability and remote replication.
Each node is symmetric and there is no single point of failure in a cluster. The system manages itself, including functions such as retiring failing nodes. If a failing disk is detected, CAStor switches the disk to read-only mode and replicates all the data on the disk. When that process is over, the disk is shut down and can be replaced during scheduled maintenance.
CAStor does not have proprietary APIs and can be accessed through an HTTP interface. It requires an x86 host with 500MB or more of RAM, hard drives, and support for Gigabit Ethernet. Pricing is $500 per disk.
In addition to Caringo and EMC, representative CAS vendors include Archivas, Avamar, BridgeHead, Bycast, Hewlett-Packard, Hitachi Data Systems, IBM, Nexsan, Permabit, and Sun/StorageTek.