Lessons in Troubleshooting - courtesy VMworld 2010

A couple of weeks ago marked the wrap on VMworld 2010. And the two weeks following the whirlwind of activity finally brings a few moments to revisit those experiences. But I'll have to confess, I have two dimensions to my observations this year - the objective and the non-objective. On the objective front: While VMworld seemed less thematic this year than in years past, there were nonetheless some heavy trends. The most notable of those in my eyes - there was tangible service provider presence at VMworld. VMware started the full march toward cloud solutions over a year ago, and this year had fruit to show that they were able to move rapidly to execution within one of the hardest markets for an enterprise solution provider to penetrate. Time will tell whether service provider presence equals a functioning ecosystem where there will be real VMware solutions business across the cloud. While there was much activity at VMworld, my money says that VMware and the service provider will have more impact upon IT than any other single trend at VMworld this year.

Now on to the non-objective. This year, courtesy of presenting at VMworld, I also found myself involved in an assortment of other peripheral activities. Somehow, all of those activities were labeled "performance". Word for the wise, sit down at unstructured sessions at a conference labeled performance, and you better expect to be troubleshooting. This is not exactly my day in and day out job these days, but nonetheless, this was a blast, and led to lots of great conversations around performance problems. This leads to my second observation: performance is becoming a core concern everywhere - a reflection of the widespread use of virtualization in production, and the underlying complexity of virtual infrastructures that is too often overlooked.

The funny thing is, 80% of those storage problems were storage related. You can go survey the web to see some tremendous performance numbers out there for well built virtual infrastructures (great example over here by Jeff Buell by the way). But for the day-to-day admin, building a good virtual infrastructure is still challenging, and this was obvious at VMworld. Inevitably, when we dug into various performance problems that often were showing up as vCenter disconnects or notable application sluggishness, the back of napkin calculations for the storage behind these virtual servers suggested that VMs are often provisioned with storage with fewer IOs than a laptop. Sometimes it was DAS, sometimes NFS, but most often FC SAN. The thing is, sometimes this will work okay - in aggregate you'll have more IO than any single VM needs; this is consolidation in action making oversubscription work. But when you suddenly have simultaneous demands, the problems surface. And at that point, you'll likely not realize that your environment has less storage than it needs, as it has been trucking along fine up to that point. The disk drive in your laptop may do better than the storage in your virtual infrastructure.

Why is this? There are two parts to the answer. 1) There remains insufficient top-to-bottom integration in the virtual infrastructure, and 2) businesses too often don't realize the interdisciplinary nature of virtualization server virtualization - it is not just servers. The two add up into a one-two punch for those not paying attention - performance problems can be amplified, and the consequences can pile up much faster. Now a single oversight - lack of attention to the storage layer for example - can impact 20 or 200 VMs. Trying to correlate and troubleshoot while you're in a performance hole, and the sides and bottom are caving in, may put you in a place where you can't dig out. Pay attention - it's about top to bottom. If you're not comfortable with your top to bottom skills, there are various vendors and VARs out there who can help you - the product landscape includes additional tools from your storage vendors (EqualLogic's SAN HQ is a great example, but most vendors now have some approach, many that are vCenter integrated), and the likes of Virtual Instruments, Akorri, and others. Meanwhile, there's a lot you can do by honing you're own virtual server admin skills, including getting better familiarity with kernel latency and various performance counters available in console tools like esxtop and vscsistats. Figure out what happens with storage transactions; figure out how IO wait times impact system latency, especially if memory swapping is going on. Those things will carry you far.

As an addendum: I'll confess that I didn't always hit the obvious in every scenario. To the guy running lots of iSCSI from both the hypervisor and guest out of separate interfaces, and wondering about how to do highly managed 10G Ethernet in a more cost effective manner than hypervisor vSwitch licensing, I probably should have been on my toes enough to recommend something like Voltaire's Ethernet switches that can do some granular QoS type stuff very cost effectively. If anybody knows that virtual server admin from Cincinnati, hope you'll relay the message.

Moreover, the guys that really stumped me were two product engineers trying to run software builds on VMs. They were getting great performance via GNU make on Solaris and Linux, but on Windows, VisualStudio gmake was a dog, taking 3 times longer than on basic hardware. If you have any suggestions on what could be slowing C++ builds in VS on a virtual proc, drop me an email.

If we talked at VMworld, hope you'll drop me a line and let me know how your performance problem is working out.


posted by: Jeff Boles

Jeff Boles, InfoStor Guest Blogger
by Jeff Boles
InfoStor Guest Blogger

Jeff has a broad background of hands-on operational IT management and infrastructure engineering experience, with more than 20 years of experience in the trenches of practicing IT.
Prior to joining the Taneja Group, Jeff was director of an infrastructure and application consulting practice at CIBER and, more recently, an IT manager with a special focus on storage management at the City of Mesa, Ariz.