Lessons In Troubleshooting – Courtesy VMworld 2010

Jeff Boles's Storage Blog Archives For September 2010

A couple of weeks ago marked the wrap on VMworld 2010. And the two weeks following the whirlwind of activity finally brings a few moments to revisit those experiences. But I’ll have to confess, I have two dimensions to my observations this year – the objective and the non-objective. On the objective front: While VMworld seemed less thematic this year than in years past, there were nonetheless some heavy trends. The most notable of those in my eyes – there was tangible service provider presence at VMworld. VMware started the full march toward cloud solutions over a year ago, and this year had fruit to show that they were able to move rapidly to execution within one of the hardest markets for an enterprise solution provider to penetrate. Time will tell whether service provider presence equals a functioning ecosystem where there will be real VMware solutions business across the cloud. While there was much activity at VMworld, my money says that VMware and the service provider will have more impact upon IT than any other single trend at VMworld this year.

Now on to the non-objective. This year, courtesy of presenting at VMworld, I also found myself involved in an assortment of other peripheral activities. Somehow, all of those activities were labeled “performance”. Word for the wise, sit down at unstructured sessions at a conference labeled performance, and you better expect to be troubleshooting. This is not exactly my day in and day out job these days, but nonetheless, this was a blast, and led to lots of great conversations around performance problems. This leads to my second observation: performance is becoming a core concern everywhere – a reflection of the widespread use of virtualization in production, and the underlying complexity of virtual infrastructures that is too often overlooked.

The funny thing is, 80% of those storage problems were storage related. You can go survey the web to see some tremendous performance numbers out there for well built virtual infrastructures (great example over here by Jeff Buell by the way). But for the day-to-day admin, building a good virtual infrastructure is still challenging, and this was obvious at VMworld. Inevitably, when we dug into various performance problems that often were showing up as vCenter disconnects or notable application sluggishness, the back of napkin calculations for the storage behind these virtual servers suggested that VMs are often provisioned with storage with fewer IOs than a laptop. Sometimes it was DAS, sometimes NFS, but most often FC SAN. The thing is, sometimes this will work okay – in aggregate you’ll have more IO than any single VM needs; this is consolidation in action making oversubscription work. But when you suddenly have simultaneous demands, the problems surface. And at that point, you’ll likely not realize that your environment has less storage than it needs, as it has been trucking along fine up to that point. The disk drive in your laptop may do better than the storage in your virtual infrastructure.

Why is this? There are two parts to the answer. 1) There remains insufficient top-to-bottom integration in the virtual infrastructure, and 2) businesses too often don’t realize the interdisciplinary nature of virtualization server virtualization – it is not just servers. The two add up into a one-two punch for those not paying attention – performance problems can be amplified, and the consequences can pile up much faster. Now a single oversight – lack of attention to the storage layer for example – can impact 20 or 200 VMs. Trying to correlate and troubleshoot while you’re in a performance hole, and the sides and bottom are caving in, may put you in a place where you can’t dig out. Pay attention – it’s about top to bottom. If you’re not comfortable with your top to bottom skills, there are various vendors and VARs out there who can help you – the product landscape includes additional tools from your storage vendors (EqualLogic’s SAN HQ is a great example, but most vendors now have some approach, many that are vCenter integrated), and the likes of Virtual Instruments, Akorri, and others. Meanwhile, there’s a lot you can do by honing you’re own virtual server admin skills, including getting better familiarity with kernel latency and various performance counters available in console tools like esxtop and vscsistats. Figure out what happens with storage transactions; figure out how IO wait times impact system latency, especially if memory swapping is going on. Those things will carry you far.

As an addendum: I’ll confess that I didn’t always hit the obvious in every scenario. To the guy running lots of iSCSI from both the hypervisor and guest out of separate interfaces, and wondering about how to do highly managed 10G Ethernet in a more cost effective manner than hypervisor vSwitch licensing, I probably should have been on my toes enough to recommend something like Voltaire’s Ethernet switches that can do some granular QoS type stuff very cost effectively. If anybody knows that virtual server admin from Cincinnati, hope you’ll relay the message.

Moreover, the guys that really stumped me were two product engineers trying to run software builds on VMs. They were getting great performance via GNU make on Solaris and Linux, but on Windows, VisualStudio gmake was a dog, taking 3 times longer than on basic hardware. If you have any suggestions on what could be slowing C++ builds in VS on a virtual proc, drop me an email.

If we talked at VMworld, hope you’ll drop me a line and let me know how your performance problem is working out.

Labels:

posted by: Jeff Boles

Gluing the virtual infrastructure together, but can we get this stuff out of the bottle?

I have just set foot on the ground for the annual Citrix analyst event in Dallas, TX, and in the next couple of days I’ll hope to make a call on what the future will hold with Citrix. Does the Citrix of tomorrow have more or less relevance in the virtual infrastructure? This year, as in years past, I see tons of potential for more, but time, and this event will tell. The potential resides in Citrix being more about the sum of the parts than the parts themselves – in fact, Citrix is really a sum specialist, whether it is about summing their own stuff or somebody else’s. Put in another analogy, they are specialists in glue, and I’m waiting to see if they’re ready to open up the bottle.

Citrix is no slouch when it comes to delivering unique “glue” between different layers of the infrastructure. Just for context, their glue experience reaches all the way back to the early days of Server-based Computing where they largely started out as compute across remote access links, and Citrix tenaciously differentiated and held on to that market despite huge disruptions. Taking this analogy to the extreme, you can look at almost any Citrix product and think of how it glues together various parts of connections, infrastructure layers, management systems, or workloads to overcome the challenges of delivering compute experiences – NetScaler, WANScaler, XenDesktop, and more. Even the latest example of Citrix’s StorageLink has been about better “gluing” together storage systems and virtual machine storage, and I’ve talked with users realizing better storage alignment, and preservation of their existing practices by running StorageLink. I have no doubt that Citrix is seeing some uptick in adoption from customers trying to virtualize while preserving their previous practices and technologies, or trying to meet some unique needs outside of generalized virtualization. If such customers are looking deeply enough, they likely realize that Citrix may have some differentiation in gluing together a whole stack of compute experience from a virtualized desktop all the way down to the storage layer, or in integrating other specialized services into a cloud-like infrastructure.

To date, we’ve continued to see some upticks in XenServer adoption (great article here courtesy Beth Pariseau), but the future still looks cloudy for general Xen adoption when measured purely against competitive hypervisors. The problem is, when the Xen hypervisor still seems to be at a disadvantage when it is considered in isolation. Recent testing (warning, the link goes to a PDF of the report) with ESXi, Xen, KVM, and Hyper-V by Taneja Group suggests that the ESXi competitors simply have a long way to go in density and performance; mostly from lack of sophisticated memory virtualization mechanisms. In point of fact, RedHat made a good demonstration of how Kernel Samepage Merging alone can extend overhead without pure performance eating paging out of physical memory – but they were the only competitor to demonstrate what I would call a sophisticated memory mechanism.

So I’ve been asked of late, given Taneja Group’s recent work in evaluating hypervisor performance, and the disparity we reported on among saturated performance capabilities, how can I continue to see a good future for this company that seems to be all about Xen? That gets back to Citrix being about the sum, rather than the parts. Citrix has turned Xen into their platform for gluing things together – with the glue being about how the parts are pulled together – and the construction paper this go around is more about the data center and clouds. In fact they’ve made good progress turning Xen to this purpose of the past 2 years, and Citrix has great cred here – even the albeit customized Xen foundation behind Amazon EC2, to the foundations behind the fast up and coming Amazon competitors like Rackspace. Moreover, we’re reminded periodically about the tremendous merits of open – IBM announced some potentially cool workload orchestration on an open hypervisor the other day (although given the level of supporting detail, I’m tempted to call horse hockey). And beyond cred, Citrix has great cards on the table in their other virtual and physical infrastructure technologies. But the cards too often remain unexposed, or only flipped up one at a time. If the audience is the cloud, they’re paying attention, and it’s time to show the cards, and not just in a private room off to the side of the parlor: network management well beyond the hypervisor edge, integration with the surrounding infrastructure, an integrated stack through compute service access to workload hosting, instrumentation, and more. But I recall saying similar things 2 years ago.

But nevertheless, for the practitioner, despite the visibility of the messaging, one of the things that any visit with Citrix shows us, is it’s worth considering what your total infrastructure looks like with virtualization as a key part of the foundation. There’s still lots of stuff outside the boundaries of the virtual server infrastructure, and virtual infrastructure innovation is equal parts about how to integrate with that stuff. The vendors can paint distinctly different pictures. We’ll see some more differentiators pop up over the next year or two, but undoubtedly the differentiators will narrow in degree and number over time. For vendors, today is the day to be competing on those differentiators, especially when they show up in important markets like cloud.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts