Henry Newman's Storage Blog Archives for January 2014

Lenovo, IBM and the Impact on Data Storage

I’ve been thinking about what IBM’s sale of the x86 business means to storage. Here are what I think some of the possible outcomes will be.

My first thought is that it’s clear in this decade that hardware is not driving software, but that software is driving hardware design. IBM is keeping its software, such as GPFS, but likely must now depend on Lenovo for development of products that will map to IBM’s software development plans and needs. It is unclear if this is what will happen, but for the sake of argument assume it will happen.

IBM has a number of x86 storage platforms and the up and coming one, the way I see it, is the GPFS appliance.  The critical issue here is that in most cases the hardware margins are low these days for appliances, but you make up for that low hardware margin with software margin.  Lenovo is going to have to design and support a hardware product that has low margins and, worse yet, relatively low volume.  What’s in it for them to create this hardware platform that IBM can handsomely profit with?  Something will have to give here if both are to be successful.

My second thought is: What happens to all of the rebranded NetApp storage products that IBM is currently selling? Do they go to Lenovo, and IBM gets them from them, does IBM keep them or what?  None of this is clear.

What is clear is there is turmoil in the market, and what happens when there is?  Vendor jockey for a better position, taking advantage of uncertainty and potential weakness.   It always happens this way – it always will. It is human nature and we are not going to change how things have been wired for us.  The best way to head off this uncertainty is with quick clear statement on the actual direction so there is no uncertainty.

Photo courtesy of Shutterstock.

Labels: data storage, IBM, Lenovo

posted by: Henry Newman

Archival Data Storage and Weather Forecasting

You might think that there is no relationship between archival data storage and weather forecasting, but without storage – and I mean lots of archival storage – our forecasts would not be improving much.

Take the following example for archival storage using HPSS, which is well known in the archival community for the most scalable HSM.   Take note that four of the top ten archive sites in the world are weather sites: ECMWF, NOAA, UK Met, and DKRZ.  The reason is that each day these sites archive all of the input data to the weather forecast.

This includes things like satellite input from all kinds of different collections to temperatures at various altitudes to wind velocities at various altitudes, buoy information in the ocean, and lots of other information. Then combine that with ground stations, airplanes, shipboard sensors and you are talking about many terabytes of input data.

Then the forecast is run – sometimes – a few times a day and the output of the forecast is saved for each forecast.  This goes on for months or years until a new forecast model is developed and now the new model has to be validated.  

So what is done is that the weather sites rerun the forecast with all of the old input data and create a new forecast.  Now sometimes the new forecast has new inputs, because, for example, a new satellite is put into service, but the models can always be run without that new data.

The new model output is compared to the old model output and statistical analysis is done to make sure that the new model provides better solution than the old model.  This is especially true when a forecast is just plain wrong. The sites make sure that the new model does a better job at prediction than the old model that got the forecast wrong.    Weather forecasting is yet another example of an application that has seen an explosion of data with no end in sight.

Photo courtesy of Shutterstock.

Labels: data storage, archive, data storage capabilities

posted by: Henry Newman

Flash in the Cloud: Why or Why Not?

IBM just released a new product that provides high IOPS support for data analysis in clouds. This is far different than the approach – proposed by many – of using flash to store all data, which makes no sense given the latency over the Internet compared with the latency of disk drives and flash (flash gives you at best likely a 50% improvement in latency reduction).

Using flash for data analytics makes complete sense, especially for anyone using the MapReduce algorithm either within Hadoop or with other commercial products that use MapReduce. Flash is going to be very important for high speed data ingest type problems like collecting security logs from thousands of connections with deep packet inspection, or point of sale information for a very large retailer, or failure analysis in a large network of systems.  I think this is a great idea but someone forgot one big issue.

If you have high speed ingest how are you going to get the data to the cloud for analysis?  Flash is important for analytics in addressing a number of types of problems in my opinion:

1. High speed ingest problems where the processors need higher performance given the incoming data rate.

2. Where you are doing lots of different experiments and have many different uses that move in and out with data.

3. Where the ratio of ingest, correlation and shuffle (distribution of the correlated data to the nodes in the cluster for searching) is far greater than the amount of searches done over time. 

My view is if you do not have one or more of these issues then flash is not going to make much a difference in the cloud. And from cost point of view, it is far too expensive compared to disk.  Amazon and others I am sure are doing this, but we shall see if the market is big enough for the costs of using flash.

Photo courtesy of Shutterstock.

Labels: Flash, data storage, cloud computing

posted by: Henry Newman

Data Storage Buyouts Reflect Industry Trends

There’s an interesting article on buyouts this year in Storage Newsletter and a look at some of the biggest buyouts in data storage history. What I found most interesting is what the buyouts say about industry trends.

For example, look at the buyouts in disk drives in the 1980s and 1990s, when most of the drive companies were consolidates. Of course there were a few drive company holdouts that were snapped up in the 2000s, like Hitachi’s purchase of IBM disk drive group and WD buying Hitachi. SSD technology buyouts really perked up in 2012 and 2013 – and as far as I can tell are showing no signs that they are slowing down. This was one of my storage predictions back at the start of 2013 and I do not think it will stop in 2014.

Over time you can see trends in buyouts as they relate to various technologies. Clearly WD has been on a buying spree over the last few years. And since Seagate purchased Xyratex we might be seeing a new trend where drive vendors start to buy again and maybe enter new markets. WD’s purchase of Hitachi disk drive division did not seem much different than when Hitachi purchased the IBM disk drive group, but the purchase of Stec, Virident and other technologies strikes me as a change of direction. Of course both are SSD vendors but Virident makes PCIe cards, and though Seagate resold a PCIe card they did not make one. 

I believe most things in the data storage industry are cyclical. The major disk drive consolidation of the 1980s and 1990s might repeat itself with Seagate and WD buying companies and expanding beyond the traditional disk drive market. Only time will tell, but with some of the changes happening on the CPUs side of the equation I would not be surprised. 

Photo courtesy of Shutterstock.

Labels: data storage, acquisition

posted by: Henry Newman