There have been a number of changes in enterprise IT organizations that have overtaken many administrators. While the IT manager’s every day is a swim through waves of invisible bits, there has long been some comfort to be found in the “physicality” and accessibility of key devices. When problems arise, administrators have always been able to identify a switch port for examination, a server at the end of a wire that might be causing problems, an HBA for inspection, or any number of other physical elements for further examination. But in today’s data center, that comfort has vanished.
In part, this is due to virtualization, and while this trend is spearheaded by server virtualization, variations include application virtualization, network device virtualization, I/O virtualization, storage virtualization, and more.
At the same time, today’s IT infrastructure has scaled out of control, with more interdependencies between systems, which often operate at their performance limits. For a given application, dependencies may cross multiple applications, servers, SAN fabrics, I/O adapters, network hops, and even data centers. When the performance limit of a single component in one of these systems is breached, common experience indicates these I/O-laden systems will not see performance gradually degrade, but rather will see latency and performance rapidly spiral out of control under a bombardment of increasingly delayed, dropped and repeated I/O attempts that can no longer be queued, cached, backed-off by congestion controls, or otherwise gracefully handled. Moreover, traditional management tools have quickly become obsolete, as today’s systems operating at immense scale have outpaced their capabilities. Faced with complexity and potentially catastrophic impacts from any change, administrators face the unknown.
Lack of instrumentation
Regardless of whether systems are physical or virtual, it has been hard to find ways to evaluate the big picture and then drill down to perform detailed inspection of the infrastructure. Administrators simply lack the instrumentation to get the data for planning, performance management, routine monitoring, compliance, or troubleshooting.
How are administrators dealing with these challenges today? From our conversations with end users, we estimate that more than 85% of businesses today rely on their initial testing of known “good” configurations or arbitrary rules of thumb rather than real data when they manage and make decisions about their virtual infrastructures. This makes planning an exercise in waste that over-provisions resources, provides no guarantees that SLAs can be met, obfuscates troubleshooting, and restricts flexibility. Moreover, as configurations change over time, administrators have no guarantees about what their current infrastructure capabilities are. We find that more than 80% of VMware users have still not deployed VMotion, and these users are often deterred by their inability to determine performance impacts when making changes to their virtual infrastructure.
The challenge facing administrators is one of getting visible, meaningful data about their infrastructure on an on-going basis, and this leaves IT administrators in the position of driving through unfamiliar terrain with fog on the windshield. But times are changing.
Today, many increasingly sophisticated technologies promise to clear this fog from the IT administrator’s windshield. Such solutions fall under the virtual infrastructure optimization (VIO) banner. VIO tools holistically assess the entire virtual infrastructure, and provide administrators with the data necessary to make intelligent decisions about capacity, utilization, and performance for every layer of the infrastructure – network, server, storage, and applications.
Some of the solutions in this area peer into single dimensions of the infrastructure, such as capacity, and spot clean the windshield. Tried and true solutions, as well as products from newer vendors, fall into this category. The list includes BMC’s VSM solutions, Computer Associates’ ASM, HP’s VSE, Novell’s PlateSpin, TwinStrata’s Clarity AP, Virtugo’s virtualSuite, and VMware’s Capacity Planner. While these solutions have a place in routine planning, they do not address an equal or more important aspect of today’s virtual infrastructure management needs: data about what is happening at the moment.
A number of other solutions enable administrators to peer into multiple dimensions of the infrastructure in real time or near real-time. These solutions deliver the integrated monitoring and analytics required to optimize or troubleshoot virtual infrastructure performance holistically, across all elements – from the application to the spindle. Moreover, such solutions provide the granular data necessary for good decision-making. Without intelligent assessment of performance, capacity and utilization can only be planned on top of assumptions that may or may not apply to any single system, or that may change at any given point in time.
Real-time or near real-time performance-based analytics is the required foundation for building and managing a virtual infrastructure. In this category we place products such as Akorri’s BalancePoint, BlueStripe’s FactFinder, NetApp/Onaro’s SANscreen VMInsight, and Virtual Instruments’ VirtualWisdom.
What it is
VIO is a critical dimension of operating a virtual infrastructure, but what is the set of challenges that these emerging VIO solutions can address better than other solutions? In our view, they build a more comprehensive data set that is the foundation for virtual infrastructure management. To shed light on this, we’ve identified a set of five key strategic questions that end users should ask about their virtual infrastructure management tools. These questions will help users identify the benefits of VIO technologies, and assess how well their VIO or management platform will address the pain points common to virtual infrastructures.
Can I efficiently plan, design, and make decisions about my infrastructure that I know with certainty will make the best use of expensive systems, and have the desired result 100% of the time?
VIO will decrease operational cost and complexity around planning infrastructure and making operational changes. Today, needless hours are spent identifying initial configurations, and assessing the potential performance and utilization impacts each time a change is required. VIO provides a data set that gives administrators visibility into infrastructure capabilities and impacts at the touch of a button, and provides information on how to optimize configurations.
Can I immediately drill into the root cause of performance issues in my environment, and discover what happened or changed?
While a VIO solution may arm an organization with the right data to avoid mis-configurations in the first place, VIO tools can also provide real-time or near real-time visibility into what is happening in an environment, enabling administrators to immediately identify performance anomalies and root causes. VIO solutions can also capture historical information, providing an audit trail that identifies when problems started, and what happened.
Can I tell with certainty that I am getting the optimal use of expensive servers, storage, hypervisors, and other infrastructure?
The right VIO solution will identify operational peaks and troughs, and save capital dollars that are today wasted on over-provisioning. These solutions can help dynamically but intelligently balance an infrastructure, making sure that even the smallest infrastructure never runs into performance problems.
Can I tell with certainty that virtual technologies don’t adversely impact my infrastructure, or concretely identify issues without vendor finger pointing?
VIO solutions can give you the data you need to hold vendors accountable, and determine which systems are the root causes of problems or are having detrimental impacts on your infrastructure.
Can I safely implement and make use of the full capabilities of the virtualization technology that has been purchased to increase my operational efficiency and improve IT capabilities?
Concern about the inability to see what happens when a technology is used is at the heart of why technologies like VMotion are so infrequently used today. Without the right supporting data, automation frameworks and policies can be dangerous landmines when they unintelligently take action on an infrastructure. It is no wonder that policies and tools for VMotion automation, storage changes, guest reconfigurations, I/O management, and other solutions are not widely used today. But VIO can provide the intelligence to take action while avoiding potentially catastrophic impacts. VIO provides predictive visibility into what will happen when changes are made, and provides assurances that what was expected did indeed happen.
VIO for better control
If you answer any of these questions with a ‘no,’ and/or these issues sound familiar, then you may indeed be driving your infrastructure over unfamiliar terrain and through a foggy windshield. The cost of doing so in wasted time and effort or wasted resource utilization can be enormous. VIO solutions can clear away this fog. But some excel better than others in virtual environments. These solutions build VIO on top of deep, on-going, performance-centric analysis and correlation of configurations, changes, and events across all layers of the data center – from applications to storage spindles, irrespective of application, hypervisor, OS, network, storage, or SAN vendor. Using collected, instrumented data about cross-domain performance and events that are often captured directly over the wire, VIO can act as a master console for monitoring operations, viewing configurations, and triggering changes across the entire data center.
Much more than lifecycle management, configuration management, or other point solutions that each may have unintended repercussions by unintelligent automation, VIO is about total data center orchestration. Let’s take a look at the core capabilities of VIO solutions, and how we see differentiation in current and emerging products.
1: Detailed visibility through actual data
VIO solutions that capture real-time data provide the best foundation for holistic infrastructure management, as only this level of detail can capture the changes that occur over time and provide the detail necessary to detect and correct issues caused by changes. VIO products vary in several dimensions when it comes to capturing detailed data:
First, administrators should assess how much detail a solution captures. Periodic sampling in conjunction with sophisticated algorithms may appear to provide a good foundation for planning, but do not provide real-time visibility for troubleshooting. There are an increasing number of solutions on the market that can take action on real-time data.
Second, administrators should assess how a solution captures data. Some VIO products rely on agent approaches, some passively poll data through available APIs, and some capture data over the wire. For each approach, the complexity of deployment, on-going management, how much detail is captured, and how potentially enormous sets of data are transferred across an infrastructure and reported upon should be examined.
2: Holistic view of the entire virtual infrastructure
VIO solutions must encompass every layer of the enterprise. Taneja Group has often written about the increasing importance of cross-domain correlation technologies. With VIO, cross-domain correlation is fundamental to taking intelligent action without guesswork about consequences. Only with this comprehensive set of data can administrators take action to correct one problem, such as excessive LUN traffic, without being worried that they might create another problem, such as excessive switch port traffic. But VIO solutions vary in their ability to capture comprehensive data, sometimes being limited to select operating systems or storage devices. Such solutions are challenged by their inability to see the full environment, but may be able to compensate through examination of behavior patterns with sophisticated algorithms. Users should evaluate the sophistication of varying approaches, and whether a solution provides visibility into the right aspects of their infrastructure.
3. Actionable correlated data
While holistic visibility is one thing, making data actionable is another. VIO solutions vary in how much data they deliver, which systems they capture data from, and what type of action they can enable or trigger. Solutions that capture over the wire may have an edge here, as they can see everything in an environment, and not just the traffic or performance a single node is exposed to. When a solution is built upon real-time data, it can provide an important perspective into on-going performance, simplify troubleshooting, and enable auditing for SLAs or for compliance purposes.
4: Extensibility and interoperability
Finally, VIO products vary in how well and deeply they integrate with enterprise systems, and they also vary in their ability to automatically trigger actions. Moreover, with the right capabilities, real-time VIO solutions can provide the right data and APIs to enable better use of other management tools, such as HP Openview, IBM Tivoli, CA Unicenter, VMware vCenter Server, or customized scripts. Users should assess their need to integrate a VIO solution with other technologies such as storage management tools, hypervisors, virtual switches, and other technologies. Look for extensibility that matches your needs.
Some solutions under the umbrella of VIO try to manage the fog on your windshield by providing a framework for automatic navigation, without shedding much light on what is going on around you. You still can’t see, but these vendors can provide management frameworks that can help you automate routine actions and maintain adherence to policies and best practices. Unfortunately, the detailed visibility into what is happening within the data center is still lacking.
The performance management-oriented VIO solutions actually clear away the fog, and let you see the road in the form of meaningful, correlated data across multiple elements in the data center. Differentiation today is found in the form of how much detail can be seen, and how VIO tools turn that detail into an actionable set of data, either by providing administrators with meaningful analysis and summarization, or integrating with other systems to automatically take action, or both.
Virtual infrastructure optimization will be a pivotal technology in defining the capabilities of the next generation data center. While today this technology is about creating real-time intelligence that can be a foundation for responsible infrastructure actions and reactions, the technology will eventually drive the evolution of more capable and deeply integrated orchestration across all systems in the data center and help create increasingly autonomic, elastic computing that will respond to dynamically changing business demands. While the future potential of these technologies seems tremendous, they are here and providing key visibility today. VIO solutions are a cornerstone of any virtualization strategy, and should be a required component of data center management tool sets.
JEFF BOLES is a senior analyst and director of validation services with the Taneja Group research and consulting firm (www.tanejagroup.com).