EMC World opened today in Las Vegas with a morning full of major announcements from EMC-acquired companies Greenplum and Isilon. This included the world’s largest file system at 15.5 PB, and a full-blown EMC adoption of the open source model for its Greenplum division at the very least.
The announcements all tied in to the conference theme of Big Data Meets the Cloud.
EMC acquired Greenplum last year – and it hasn’t wasted any time in shaking up several markets with its strategies in this area – specifically, big data, open source, analytics and unstructured data.
Greenplum founder Scott Yarra, who is now running a new EMC Data Computing Products Division, has been working with EMC to build a data warehouse appliance that incorporates Greenplum software. And looking forward, VMware, SAS data analytics and the Apache Hadoop platform are being folded in to create what was described as the next generation cloud-based data warehouse/analytics platform.
“Apache Hadoop has emerged as an important data technology and processing platform for unstructured data,” Yara said. “Hadoop is playing a significant part in the establishment of our big data/analytics stack.”
Apache Hadoop is an open-source technology inspired by Google MapReduce and Google File System implementations. It is a software framework that supports data-intensive distributed applications and is effective for analyzing and storing massive amounts of data. Yahoo, Facebook, eHarmony, Twitter, eBay and others have been using it to be more agile and to mine unstructured data, which represents most data these days. It combines software, commodity hardware and simple interconnects. Now EMC Greenplum is planning to make it enterprise ready.
EMC announced a series of new products in its Greenplum line:
The EMC Greenplum HD Data Computing Appliance for Hadoop combines real-time with deep analytics along with massive scale out storage.
“We can run analysis across structured and unstructured data at the same time,” said Yara. “Both engines running together and one query can pull data from both engines.”
The Greenplum HD Community Edition, which is a virtual machine-based appliance with all core features contributed back to Apache Hadoop. In other words, EMC is embracing the open-source community in a big way.
The Greenplum HD Enterprise Edition, which goes beyond the Community version by adding advanced features to deal with large enterprise, mission-critical environments. This includes data management features such as snapshots and wide area replication; simple data loading and access using a native network file system (NFS) interface; and end-to-end manageability including simple cluster deployment, automatic failure detection and notification, multi-site management and rolling upgrades.
For this Enterprise version, EMC has been working quietly with startup MapR which is developing a new faster Hadoop distribution. John Schoeder, CEO of MapR came out of stealth mode at EMC World with his first public utterance about MapR work to make it much easier to build, deploy and manage Hadoop.
“We’ve moved Hadoop from being batch only to supporting data analytics while also making it screaming fast,” said Schroeder. “You can run faster on half the hardware than any other Hadoop distribution.”
EMC’s big data platform takes advantage of commodity servers using Intel processors and SATA disk and Just a Bunch of Disks (JDOB) storage systems. A software distribution will be available later this quarter.
“Hadoop has played a leading role in the transformation from traditional data warehousing to Big Data Analytics,” said John Webster, Senior Partner, Evaluator Group. “EMC’s Hadoop commercialization strategy is aimed at streamlining and bulletproofing Hadoop for enterprise users, making Hadoop more of a must-have real-time analytics tool for the enterprise.”
The EMC Greenplum HD Community Edition, EMC Greenplum HD Enterprise Edition and the EMC GreenplumHD Data Computing Appliance are expected to be available in the third quarter of calendar 2011.
Isilon Releases Big Data NAS Platform
Sujal Pattel, president of EMC’s Isilon Storage Division (and founder of Isilon), followed with yet more big data announcements centered around the company’s NL Series, which tops out at 15.5 PB in one file system. The EMC Isilon 108NL takes advantage of 3- terabyte (TB) Hitachi drives in a 4U node.
“We have created the world’s largest single file system,” Patel said.
Rick Villars, an analyst with IDC said he believes that this is important for the future of big data.
“Leveraging big data into business value can only be accomplished with a simple, scalable and highly flexible storage foundation as the core of IT infrastructure,” said “Scale-out NAS is that foundation and Isilon’s new products deliver on that promise by combining new levels of performance, scale and simplicity to aggregate big data and enable real-time collaboration.”
In addition, the 108NL comes with Smartlock software which brings Write Once Read Many (WORM) capability to these large file systems.
“Once locked, a file cannot be moved or changed in any manner,” Patel said. “Each protected file is given a unique, verifiable signature validating its integrity and status within the file system. When combined with Isilon’s 108NL, SmartLock provides the highest levels of protection for nearline big data archives.”
The Isilon 108NL hardware product and SmartLock software application are both available immediately. The 108NL has a list price of $123,500 per node. SmartLock begins at list price of $1,950 per node.