By Farid Neema
NAS virtualization is a software-based solution that, in its broadest definition, lets you manage a heterogeneous environment-including stand-alone and clustered file servers-as one storage pool. When one device reaches its limit, the virtualization software automatically allocates storage from that device to another NAS device or file server in the same pool. Some systems stripe files across RAID systems connected to different servers and re-stripe them transparently when new RAID systems are added.
Before delving into NAS virtualization, it’s important to review some of the basics behind both NAS and virtualization.
A NAS server is a single-purpose device designed to be easy to purchase, implement, and use. It combines a file server with a storage array in a single, integrated appliance, also referred to as a filer. A NAS system presents data in the form of files, allowing consolidation of storage that can be shared by other servers on the network. The simplicity of NAS for file services has been very attractive to IT managers. However, even though managing a few NAS devices is easy, managing many devices has proven to be challenging and costly.
A single file system that expands to the limits of a server’s physical capacity requires a time-consuming and complex upgrade to a larger server with more capacity. An easier solution is simply to deploy another server or NAS device. However, administrators must manage each server or NAS device, each with its own file system and each requiring independent manual attention for both file-system and capacity management issues.
With a high server count, this not only proves costly and inefficient, but it also puts a significant strain on the IT manager’s ability to adequately back up and protect critical data. Furthermore, some filers support only a single type of storage, making it difficult to tier different service levels for different business requirements. Each of these factors impacts the total cost of ownership (TCO).
NAS virtualization promises to hide complexity, automate tedious tasks, and streamline administration, and still meet the requirements of high performance and low TCO.
Traditionally, storage virtualization refers to a level of abstraction implemented in software that provides address mapping between physical entities and virtual entities. Applied to disk drives, virtualization divides available storage into virtual volumes. Virtual volumes are used by an operating system as if they were physical disk drives. The storage virtualization layer redirects I/O requests made against a virtual disk to blocks in physical storage. The system can move physical blocks and update the virtual-to-real mappings at any time. Rather than having to administer data by working with each disk in a disk farm, virtualization allows IT managers to deal with multi-disk storage as a single logical entity.
You can implement layers of virtualization at several levels and with several degrees of sophistication. One of the basic levels is found in volume managers, which are software utilities that create logical volume groups out of physical disk drives. RAID is an example of disk volume management that provides virtualization. Most operating systems have some level of volume management built in or are layered using third-party products. Most databases use an underlying file system to provide some level of virtualization so applications can address a space rather than a physical disk. NAS systems also provide a more advanced level of volume management and virtualization.
Storage virtualization implementations use different architectures and can be implemented in many ways along the I/O path. Most current implementations are in the host, the storage system, or the network fabric.
The virtualization function can be implemented in-band or out-of-band. An in-band implementation intercepts the data and controls access to the virtual pool of storage that it manages. RAID and NAS are examples of in-band virtualization. An in-band appliance can become a bottleneck and can constitute a single point of failure. However, multiple-server virtualization alleviates the performance concerns of in-band virtualization and allows for server fail-over to occur when one of the appliances is down.
In an out-of-band implementation, control functions and virtual-to-physical mappings are handled by a separate entity, while the data flows directly between the hosts and the storage devices.
A potential drawback to the out-of-band approach is that it does not provide read/write access to active data, which means that the appliance cannot handle open files to ensure continuous access.
NAS virtualization aggregates multiple NAS appliances so they can function as a single managed device.
The aggregation is enabled by a new file system that can either replace the native file systems or layer on top of existing file systems.
A global file system allows files to keep their namespaces. Clients only need access to one file system to get access to any data in the system. All files that have been moved will still appear to be in their original directory and path.
The most common implementations of NAS virtualization are the following:
NAS aggregation appliances front-end traditional NAS servers: They can either aggregate homogeneous servers from a single vendor, or heterogeneous servers from different vendors. This approach virtualizes all storage resources across multiple NAS servers. By residing between the clients and existing NAS storage, aggregation appliances virtualize the storage to the clients. A potential drawback to this approach is processing latency, which can negatively affect performance.
A NAS gateway front-ends a SAN fabric, allowing storage to be managed in a pool. Some gateway implementations manage files with storage capacity assigned to file systems across multiple devices.
Distributed server virtualization does not present an additional layer above each server. It eliminates the need for a separate virtualization appliance and alleviates the performance issues sometimes associated with in-band virtualization. By adding pooled NAS servers rather than adding capacity behind a single server, enterprises can scale performance and capacity at the same time. This approach effectively makes each server’s capacity part of a seamless pool.
This type of virtualization is achieved through a distributed file system (DFS). A DFS allows a single file system to span across all nodes in the DFS cluster, essentially creating a unified logical namespace for all files.
The result is an environment where file shares are available from any server node for any client. Distributed server virtualization uses commodity operating systems and hardware platforms (see figure, p. 36)
Products range from software-only solutions to complete systems. The “software-only” solution is mounted on a standard server, and software agents are mounted on all servers that are aggregated. This approach enables users to connect server and storage resources, and to use any vendor’s disk arrays.
The complete system approach includes a server or filer, virtualization software, and disk arrays. An integrated hardware-software system ensures the vendor has performed complete qualification and will provide support for all the components in the system.
Virtualization provides flexibility for dynamic management and allocation of devices and storage volumes. It allows users to grow or shrink the storage pool transparently as business needs require and to migrate data within the pool dynamically to servers according to user policies. However, NAS virtualization products differ widely in implementation and functionality.
Five characteristics-availability, scalability, performance, manageability, and cost-consistently rank at the top of users’ most wanted features list (see figure, above).
For NAS virtualization, two additional functions-connectivity and ease-of-use-are commonly required. The best way to evaluate products is to assess features in the context of these characteristics.
The most important user concern is data and application availability (which is closely related to reliability). Beyond system and path redundancy and fail-over, data availability may include the ability to connect and disconnect online (hot plug) for non-disruptive maintenance and reconfiguring, automatic fault detection, isolation and recovery, online repair, and complete system restoration after failures.
For NAS virtualization, the following are a few questions that should be addressed:Does the system have a single point of failure? Is a high-availability cluster with automatic fail-over offered? Are disks/RAID arrays accessible and shareable between NAS heads? Are hot spares used? How many?
Scalability is among the most important features required in a virtualization product (see figure, right). Scalability can include capacity, performance, and connectivity-preferably without affecting ongoing operations.
Scalability-related questions might include the following: Can the system independently scale capacity, performance, and I/O? Can bandwidth or performance be increased without impacting clients? Does the system support heterogeneous arrays? Can a single file system be transparently scaled to multiple servers? How many nodes and Ethernet ports are supported? Does the system cover the long-term growth objectives?
Improved performance is the top benefit users expect from virtualization (see figure, right). Several levels of performance improvements can be expected via features such as real-time load-balancing and dynamic path reallocation. Performance-related questions include the following: Is data migrated for capacity or performance load-balancing? What are the type, number, and speed of processors used? What are the sizes of the write-and-read caches? Is striping of files supported across server nodes? Is automatic load-balancing performed? How many servers can be clustered? Does performance scale linearly with the number of nodes?
Managing storage is one of the largest costs in administering networks. A key criterion in storage management is the ability to manage all storage components and servers from any point on the network.
Management includes the processes of configuration, monitoring, load-balancing, diagnosing, and reporting. NAS virtualization engines can integrate and automate some or all of these steps: Is there consolidated management of many NAS devices? Is the management centralized or distributed? Can aggregation of any vendor’s NAS server be performed, or is it limited to one vendor’s systems? Can open files be migrated transparently? Does the system perform automatic discovery of new server and storage additions? Is a global namespace available?
The real cost of storage is not in the hardware and software, but in the labor involved in managing storage and the associated productivity loss. Therefore, TCO needs to be taken into account to include productivity gains due to increased performance, simplified management, better utilization of resources, elimination of over-provisioning, and increased data availability.
Questions to be addressed include the following: Is the solution based on low-cost commodity hardware? What is the TCO? Does the system include software to simplify provisioning and maximize storage utilization? Does the price include data management software for sharing, consolidating, and protecting data?
Connectivity features include the ability to add new elements without disrupting ongoing operations, added distance, and serving a variety of platforms and operating systems. For any-to-any connectivity, every host must be able to address every storage device on the network. If a switch is involved, parameters that affect connectivity include the number of ports and the ability for each port to connect to any other port in duplex mode: Does the system allow connection to Windows, Unix, Linux, and Apple platforms, and can the operating systems seamlessly share files? Can the virtualization product make use of existing hardware and software resources? What disk drive types are supported (ATA, SATA, SCSI, SAS, Fibre Channel, etc.)? Can third-party disk arrays be used? Can the virtualization system aggregate other vendors’ NAS systems? Is the solution qualified with your existing backup and/or replication software? Is WAN connectivity available?
NAS aggregation addresses the scaling, performance, and management problems that plague some NAS installations today. For users, virtualization is fundamentally about simplification. Virtualization enables consolidated management, sustained performance and availability for distributed storage, better utilization of servers, filers and storage resources, cost optimization, and improved data protection.
Farid Neema is president of Peripheral Concepts. This article was excerpted from a report, NAS and NAS virtualization, published by Peripheral Concepts Inc. and Coughlin Associates. For more information on the report, visit www.periconcepts.com.