Managing a SAN, whether it’s Fibre Channel or iSCSI, in a physical server environment is tough enough, but managing SANs in virtual server environments is even tougher. And the challenge is compounded if it’s the server or applications administrators, rather than the storage administrators, that are tasked with making sure that the storage network and associated devices are not causing bottlenecks that could compromise the benefits of virtualization.
Until recently, when a performance bottleneck occurred in a virtual server environment, various teams (e.g., applications, server, network and storage) got together and somewhat blindly tried to isolate the problem(s). This process was time-consuming, and time equates to money.
In the majority of cases, performance problems in a virtual server environment can be attributed to the SAN. In fact, in a presentation at the VMworld Europe show in Cannes earlier this year, Scott Drummonds, VMware’s group manager, technical marketing, said that 90% of the performance problems in VMware environments were SAN-related.
Checks with end users and independent analysts suggest that that percentage may be closer to 70%, but nonetheless it’s clear that the SAN is the culprit in the majority of instances where performance problems compromise the efficiency of a virtual server environment.
And as virtual server environments grow, and SAN re-configurations become routine, traditional management tools – including device-specific monitoring and management software – often fall short.
The problem is that administrators have not had the tools to enable them to drill deep into the virtual infrastructure to identify, and correct, problems. And that lack of visibility leads to over-provisioning and an inability optimize performance and diagnose problems. Fortunately, tools are emerging that can help IT managers get better, deeper visibility into the entire virtual infrastructure in order to optimize configurations and performance.
The Taneja Group research and consulting firm refers to these tools collectively as being in the virtual infrastructure optimization (VIO) product category. The firm notes that a variety of vendors and products meet the VIO definition to varying degrees. But for storage-centric administrators, the products that may be of most interest include Akorri’s BalancePoint, NetApp’s SANscreen VM Insight, and Virtual Instruments’ VirtualWisdom (which is based on the company’s NetWisdom and is currently in beta, with general availability slated for the fourth quarter).
Although all of those vendors have roots in the storage side of the infrastructure, their products enable cross-domain monitoring and analysis across the entire virtual server infrastructure; e.g., applications, servers, network and storage. To collect non-SAN-specific information, VIO tools often rely on (in the case of VMware environments) VMware’s VirtualCenter (vCenter) APIs to collect information regarding VMware ESX clusters, specific virtual machines (VMs), hosts, CPUs and memory, as well as other sources such as SNMP and network-attached probes.
In addition to virtual server environments, SAN management tools can also be used to manage physical server environments, as well as mixed physical-virtual environments. In some cases, the same software can be used in either environment, while in other cases vendors have separate, yet integrated, products for the two types of environments.
The tools vary in their support for SAN types and virtual server environments. Some support only VMware, while others (such as Akorri’s BalancePoint) support VMware, Microsoft’s Hyper-V and Citrix Xen platforms. And some support only Fibre Channel SANs, while others (such as NetApp’s SANscreen VM Insight) also support iSCSI SANs and NAS.
Performance problems in SANs can arise from a number of sources. “In Fibre Channel SANs, you run into a lot of problems with firmware and software revisions or upgrades, and incompatibilities or performance issues between SAN components such as HBAs, disk arrays and switches,” says Jeff Boles, a senior analyst and director of validation services with the Taneja Group, “and administrators simply lack the instrumentation to get the data for planning, performance management, monitoring and troubleshooting.” (For more information on Boles’ view of VIO tools, see “Why you need virtual infrastructure optimization” at infostor.com.)
According to Boles: “VIO tools holistically assess the entire virtual infrastructure, and provide administrators with the data necessary to make intelligent decisions about capacity, utilization, and performance for every layer of the infrastructure – network, server, storage, and applications.”
The key benefits of these tools include a reduction in the time required to troubleshoot problems and determine root causes; a reduction in the costs associated with over-provisioning and buying more resources than is necessary; the ability to gather data to make vendors more accountable; and the ability to know what will happen if you make changes to your infrastructure.
Case Study
To get an idea of how SAN managers are using these tools, consider the case of Ryan Perkowski, SAN manager at a division of a global financial institution. Perkowski oversees a SAN with almost 100 servers; EMC DMX 3, 4 and Clariion arrays; a pair of Cisco 9509 switches; about 600 SAN ports; and more than 420TB of disk capacity.
The servers on Perkowski’s SAN are about 75% AIX, 20% Solaris, and 5% VMware, but the VMware portion is growing rapidly. He uses Virtual Instruments’ NetWisdom to manage the physical infrastructure, but as the VMware portion grows he plans to upgrade to Virtual Instruments’ VirtualWisdom monitoring and analysis tools. VirtualWisdom is built on NetWisdom, but will include virtualization-specific functionality.
“As you add VMs and your SAN gets bigger and bigger, you hit a point where you have to know when you’re running too much on any given piece of hardware and you have to buy more hardware,” says Perkowski. “There’s no way I can get the I/O performance information I need, including response time statistics, without NetWisdom.”
Virtual Instruments' VirtualWisdom provides real-time instrumentation, measurement and analysis of Fibre Channel SAN traffic.
Perkowski uses NetWisdom primarily for analyzing, troubleshooting and tuning his infrastructure, including the SAN. “You don’t know how much you don’t know until you look closely at it,” says Perkowski, echoing Virtual Instruments’ phrase: “You can’t optimize what you can’t measure.”
Prior to installing NetWisdom, Perkowski used monitoring tools from the device vendors, including Cisco and EMC, but he notes that, “with the equipment manufacturers, monitoring seems to be an afterthought.”
In Perkowski’s case, NetWisdom collects statistics from the switch fabric and from probes that sit in front of the disk arrays. He says that 90% of the performance problems that he isolates are at the operating system or HBA level.
For non-SAN-specific data, VirtualWisdom gathers information from vCenter, such as CPU activity and LAN I/O statistics, which is mapped into the SAN-specific information for a holistic view of the infrastructure.
According to Virtual Instruments’ vice president of marketing, Len Rosenthal, some of the differentiators for NetWisdom and VirtualWisdom, vs. competing products, are that both products deliver statistics in real time (as opposed to approaches that “sample” data at intervals) and that they can scale into the tens of petabytes.
“We provide real-time, deterministic measurement, versus sample surveys, and we don’t impact performance” says Rosenthal.
Real-time gathering of statistics may have advantages in some scenarios. “With real-time monitoring and capture, vs. polling or sampling, you can build a better ‘chain of evidence’ regarding performance problems,” says the Taneja Group’s Boles. “Real-time capture provides deeper root-cause analysis.”
NetWisdom and VirtualWisdom use a combination of ProbeV and ProbeVM software and Traffic Analysis Point (TAP) devices to monitor Fibre Channel SAN traffic.
Case Study
Consolidated Communications, a local telephone company and provider of communications services in Illinois, Texas and Pennsylvania, is in the process of migrating from a physical infrastructure to a virtual infrastructure. The company is using Akorri’s BalancePoint monitoring and management platform to get a holist view of both environments in order to optimize performance during the physical-to-virtual (P2V) migration.
“We’re in the process of deploying two new virtual farms which are growing daily, and BalancePoint has been extremely useful in providing a performance baseline and comparisons as we move to virtualization,” says Matt Denman, manager of database/Unix administration at Consolidated Communications.
Akorri's BalancePoint provides automated visibility and analysis across virtual and physical IT infrastructure
The benefits of using BalancePoint, according to Denman, are a significant reduction in the time and resources required to troubleshoot and plan during the migration. “With full visibility into all components attached to our SAN, we can easily pinpoint trouble areas, balance our workloads, and monitor and trend capacity and growth,” Denman explains.
BalancePoint supports VMware, Microsoft and Citrix virtual server platforms, as well as physical servers, and storage virtualization platforms from IBM and Hitachi Data Systems.
“Users are past the early phases of test and development with their virtual server rollouts, and they need tools to analyze performance problems and identify root causes,” says Rich Corley, Akorri’s founder and CTO. “BalancePoint maps performance from the application layer through the ESX layer and down to the LUN level, identifying problem areas such as configuration issues, bottlenecks and contention points. With ‘VM sprawl,’ you have to understand the levels of utilization in order to leverage the VMs more efficiently.”
BalancePoint includes a number of modules, including Cross-Domain Analysis (which collects information from servers, storage and software and correlates performance across those domains), Application Fingerprinting to profile and characterize application workloads and models, and Performance Dynamics Modeling, which delivers warning alerts regarding problems before they happen. The suite also includes Key Performance Indicators (KPIs) that indicate the optimal balance between application requirements and resource capabilities.
SANscreen Evolves
NetApp’s SANscreen software suite is one of the more established SAN management platforms for heterogeneous storage networks. SANscreen was originally just a SAN discovery and change management tool, but since then modules been added for managing performance, replication, storage and server virtualization, capacity management and NAS environments. NetApp introduced SANscreen VM Insight, one of the optional modules for SANscreen, in late 2007. (Other SANscreen modules include Service Insight, Service Assurance, Application Insight, Replication Assurance, Capacity Manager and Provisioning Manager.)
“SANscreen VM Insight allows you to gather all the performance data from the switch fabric through the storage arrays, maps everything back to the VMs, and provides recommendations on how to rebalance your virtual environment for optimum performance,” says Paul Turner, general manager for NetApp’s SANscreen business unit, “and it provides a lot of path-level policies for management.”
For non-storage data, SANscreen uses VMware’s VirtualCenter to gather information about VMs, datastores, CPU, memory and configuration information.
SANscreen can be used in Fibre Channel or iSCSI SANs, as well as NAS environments.
Turner notes that many SANscreen users combine VM Insight with other storage-related analysis modules, including Capacity Manager (which provides global visibility into storage resource allocation and generates a variety of capacity-oriented reports, including storage charge-back reports) and Application Insight (which aligns storage resources with application service levels, and collects performance data from SAN resources such as disk arrays and fabric switches).
In explaining the need for SAN management tools that are tuned for virtual server environments, Turner cites an Enterprise Strategy Group survey that reported that end users’ top three concerns in the context of virtual servers were 1) performance, 2) enforcing best practices, and 3) managing the storage infrastructure and costs.
HDS Bridges Server-Storage Virtualization Gap
With the goal of creating tighter synergies between server virtualization and storage virtualization, Hitachi Data Systems recently rolled out the Hitachi Storage Cluster for Microsoft Hyper-V, which extends business continuity (BC) and disaster recovery (DR) for Hyper-V environments to remote sites.
HDS’ technology works with Microsoft’s Windows Server 2008 Hyper-V, Multipath I/O and clustering, and includes HDS products such as Hitachi Storage Cluster (which manages replication and works with Microsoft Failover Clustering); TrueCopy and/or Hitachi Universal Replicator (HUR) replication software; and the company’s Adaptable Modular Storage (AMS) and/or Unified Storage Platform (USP) disk arrays. The solution also includes automated failover and data resynchronization, as well as failback of virtual machines (VMs).
HDS refers to the strategy as end-to-end (E2E) virtualization.
“We’re trying to bridge the gap between server and storage virtualization to drive efficiency,” says Heidi Biggar, a solution marketing principal with HDS, “and BC/DR is the first proof point.”
Companies can use the synchronous TrueCopy replication for business continuity scenarios, or the asynchronous HUR replication for disaster recovery scenarios. When implemented with the USP-V (virtualization) disk platform, users can integrate heterogeneous disk arrays into the configuration.
How It Works
In normal operation, VMs run on the primary server and all data is replicated to a remote system. In the event of a server failure or scheduled maintenance, VMs can be migrated to another local server via Microsoft technology. In the event of a major outage at the primary site, all replicated VMs failover automatically to the recovery site.
When the primary site comes back online, data is automatically re-synchronized, VMs are moved back to the primary site via Hitachi’s failback technology, and normal operations are resumed. —Dave Simpson
UltraBac Streamlines VMware Backups
UltraBac Software recently began shipping the 9.0 version of its UltraBac backup software and the 5.0 version of its UBDR Gold recovery software, both with enhancements for backing up VMware virtual server environments.
UBDR Gold 5.0 now has the ability to create ESX or GSX Virtual Machine Disk (VMDK) files while backing up a live system. This optional functionality eliminates a number of steps required by other approaches to disaster recovery in VMware environments, including the conversion typically required in physical-to-virtual (P2V) restore operations, according to Chana Flynn, UltraBac’s marketing manager.
“The software creates the [ESX or GSX VMDK] file on the fly as the backup is being performed,” explains Chip Coomes, UltraBac’s QA manager. “It’s created as a secondary stream along with the native backup data, and is converted while the stream is being written, which eliminates a separate conversion process.”
The company claims that the P2V operation can take less than three minutes (not counting the original backup time) due to the reduced processing requirements.
Another new feature in UBDR Gold 5.0 is that subsequent differential and incremental backups automatically update the base VMDK file when performed to a UNC share path, ensuring up-to-date data for P2V operations.
The key enhancement to UltraBac 9.0 is an improved management console based on an Office 2007-like tabular format, but the company also added support for Microsoft SQL Server 2008 and SharePoint 2007, as well as VMware Consolidate Backup (VCB) agents.
The VCB agents provide centralized backup for all guest machines, and do not require software to be installed locally. Administrators can select either a file-by-file or image backup operation, and can choose which virtual machines (VMs) to back up while performing a FullVM backup. For more granularity, users can also select which VMs, files and folders to back up in a FileVM backup.
UBDR Gold Server Edition is priced at $995. The UltraBac Server Edition is priced at $495, while the Enterprise Edition is priced at $1,095. The VCB agent costs $495 per socket. —Dave Simpson