Greenplum Seeks to Inspire Big Data Innovation

By Drew Robb

Day Two of EMC World saw far fewer releases than day one, where literally dozens of new or updated products were unveiled. As well as the company's latest acquisition, the company announced some Documentun information rights management news and some VCE-related news. But the highlight of the day probably came from Greenplum, the big data company EMC acquired about two years ago.

The Greenplum Analytics Workbench is a 1,000-node cluster that can act as a lab environment for Big Data development. EMC intends to use it to test the limits of its own scale-out infrastructure technology and to develop new approaches to big data analytics.

EMC is being benign about the workbench, making it freely available to the open source community in conjunction with the Apache Software Foundation. The intention is to add the Greenplum Analytics Workbench to the development efforts around Hadoop as part of the rush to big data.

OK, so what is it? The Greenplum Analytics Workbench contains technology from EMC, Intel, Mellanox Technologies, Micron, Seagate, SuperMicro, Switch and VMware. The resulting 1,000 hardware node cluster (or 10,000 nodes with the addition of virtual machines) is housed in a test bed consisting of 24 petabytes of physical storage.

Scott Yara, senior vice president of products and co-founder of Greenplum, said that currently Workbench does not carry a fee. It is free. However, there is already a waiting list of scientists and corporations eager to use it. As a result, an onboarding process has been developed to vet candidates based on the value of their contribution to the big data knowledge base.

"The Greenplum Analytics Workbench may evolve into a commercial offering," said Yara. "But right now we are more interested in uncovering business cases for big data so we have made it research-focused at the moment."


Another interesting product released at EMC World came from Virtual Computing Environment (VCE), which is an initiative of Cisco, EMC, VMware and Intel. VCE has previously released a product known as Vblock, which is essentially a cloud system in a box. Everything is preconfigured to work together. You simply add more blocks as you expand –- the company noted that it has several hundred users to date. At the show, VCE announced some new Vblock, data mobility and data protection features.

The announcement concerns Vblock integration with EMC VPlex virtual storage to move workloads across geographies and also between service providers. Pooling the resources of multiple Vblock systems makes it possible for applications and data to be shared and shifted easily. This is considered an important feature for the cloud to prevent trouble when a company seeks to change providers or wants to move around its data.

The Vblock Series 700 Model LX system can support thousands of VMs simultaneously in the cloud. It includes a VMax 10K storage array, a Cisco UCS server, Cisco Nexus switches and runs VMware vSphere 5. For this new Vblock release, additional data protection functions are being added. They have been provided using a combo of other EMC technologies such as Avamar Data Domain, RecoverPoint and VPlex.

Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).

Follow InfoStor on Twitter

This article was originally published on May 23, 2012