By Kevin Komiega
—Hitachi Data Systems has unveiled a new version of its Content Archive Platform with a slew of enhanced features in replication, security, de-duplication, and compression, but the selling point could potentially be the sheer size of the system.
Version 2.0 of the Hitachi Content Archive Platform (HCAP) can support up to 20PB of storage in an 80-node archive system. A single HCAP node can scale up to 400 million objects (files, metadata, and policies), and an 80-node system can support up to 32 billion objects. Hitachi claims the platform outperforms previous-generation CAS systems by 470%.
When it comes to building out the archive, Hitachi's approach is to scale archive server nodes and storage capacity independently rather than requiring additional servers and processing power to scale storage.
The launch of HCAP 2.0 comes on the heels of the debut of Hitachi's latest high-end storage array: the Universal Storage Platform (USP) V. And it's not by coincidence that the two platforms have a lot of technology in common.
"The new release of the Content Archive Platform shares the same philosophy of disaggregating servers and storage as the recently announced USP V platform," says Asim Zaheer, senior director of business development for content archiving at HDS.
The USP V touts the combination of a virtualization layer with thin-provisioning software to offer users consolidation, external storage virtualization, and the power and cooling advantages of thin provisioning.
The combination of the aforementioned technologies allows for the management of up to—theoretically—;247PB of virtualized capacity, approximately 670% more than the previous-generation TagmaStore USP platform. The company also claims a maximum performance of 3.5 million I/Os per second (IOPS), a 5x increase over its previous arrays.
The HCAP can attach to a virtual storage pool with the USP V, thereby acting as an archive tier of storage where aged data on primary storage can be moved. Data in the archive can be offloaded from expensive disk to less expensive ATA or Serial ATA (SATA) storage.
The previous version of Hitachi's archiving product, until now, had only been offered as an appliance based on the TagmaStore Workgroup Modular Storage model WMS100 with servers that offered software connectivity into the infrastructure. Zaheer says Hitachi will continue to offer appliance-based versions of the archiving platform at various capacity points for customers who want a turnkey product, but there is also the HCAP-DL (diskless) version, which supports all of Hitachi's storage systems, including the USP V, USP (formerly branded as TagmaStore), Network Storage Controller, Adaptable Modular Storage systems, and Workgroup Modular Storage arrays.
"The salient point here is that Hitachi is divorcing the concept of what the software does from the whole hardware stack," says John Webster, principal IT advisor with the Illuminata research and consulting firm. "That makes the HCAP much more appealing to customers because now they can potentially take legacy storage devices and include them under the umbrella."
However, Webster admits, to add legacy or commodity storage to the virtual pool, end users have to put a USP V in between the HCAP-DL and the arrays. "But for USP V customers that's great," he says. "Now they have a number of different ways to [implement the archiving platform]."
Pricing for different models of the HCAP varies considerably based on the storage platform being used on the back-end, but, for example, an entry-level, 5TB HCAP system is priced at approximately $70,000.
In an effort to limit the need for proprietary APIs, the HCAP uses standards-based interfaces such as NFS, CIFS, Web-based Distributed Authoring and Versioning (WebDAV), and HTTP as well as storage management standards such as the Storage Management Initiative Specification (SMI-S) to integrate content-producing applications into the archive.
Hitachi also introduced a new encryption solution referred to as "Secret Sharing." The patent-pending technology allows users to store their security key within the HCAP and share that key across multiple nodes within the archive.
"As content comes into the system we protect it with standard AES encryption, but the differentiator is our distributed key management system based on our Secret Sharing technology," says Zaheer. "Rather than having a single key in a single location we distribute pieces of the key across the environment. Users need all of the pieces of that key in order to gain access to and decrypt the data."
Secret Sharing ensures only a fully operational system with all of its nodes connected to the archive will be able to decrypt the content, metadata, and search index. Zaheer says if a server or storage device is stolen or removed from the cluster, the device would be automatically encrypted and immediately unreadable by any other device.
Hitachi has thrown data de-duplication into the mix to eliminate storing redundant data in the archive. Zaheer claims Hitachi's approach to data de-duplication is "collision-proof," in that it performs both hash comparisons and binary comparisons to ensure objects are actual duplicates, therefore avoiding "hash collisions" where different objects could have the same cryptographic hash key. "Most de-duplication methods use a hash key to compare hash values between files, but it is sometimes possible to have the same hash key for different files. We perform a binary comparison before we collapse a file and reclaim the capacity," he says.
Hitachi's archiving system is comprised of homegrown HDS hardware and software and technology the company acquired through the purchase of digital archiving start-up Archivas last February.
Archivas' software, Archivas Cluster (ArC), simultaneously indexes metadata and content as files are written to the archive, with the built-in ability to extract text and metadata from 370 file formats.
ArC also provides event-based updating of the full text and metadata index as retention status changes or as files are deleted. The ArC software is what enables HCAP to scale to 80 nodes, support a single global namespace with more than 2PB of capacity, and manage more than two-billion files.