Focus on fault resiliency, storage resource management, data recovery, and virtualization techniques.
By Mark Teter
The ability to manage and control today's storage demands is tenuous. In the world of e-business, storage capacity requirements change rapidly, with only a short window of time to change storage allocations. Worse yet, many companies are concerned about emerging industry standards, such as SCSI over TCP/IP, storage over IP, and Fibre Channel over IP (see
Software and hardware solutions are emerging from vendors such as Compaq, Datacore Software, DataDirect Networks, StorageApps, and Veritas Software that erase the traditional boundaries between multi-vendor storage systems. These vendors have developed SAN appliances, or storage domain managers, that virtualize storage resources into "pools."
Storage pools can be managed as a single resource, providing a true "plug-in" storage model. The only requirement for hosts is a pair of Fibre Channel host bus adapters (HBAs) to gain access to an unlimited amount of disk capacity. Whether or not you are ready to deploy virtualized pools in storage area network (SAN) configurations, or are waiting to see how the standards settle out, there are a number of strategies that can help manage and deploy disk storage.
Perhaps the most important storage consideration is fault resiliency. The underlying storage network and devices must have no single points of failure, providing multiple access paths to redundant switches and HBAs with automatic failover and load-balancing functions. Companies must address the cabling infrastructure, storage management issues, backup and recovery processes, and security requirements.
Figure 1: In a fabric configured with no single points of failure, removing a switch does not affect fabric operations.
An ideal fault-resistant storage fabric has multiple ports for front-end server connectivity and multiple ports for back-end storage devices. Other ports can be allocated to Ethernet LANs or IP clients. Figures 1 and 2 show fabrics built with no single point of failure using Fibre Channel switches. In these configurations, a single switch could be disabled and the fabric would not drop connections.
Fibre Channel switch vendors include Ancor, Brocade, Gadzoox, Inrange, McData, and Vixel. To reduce the number of required switches use Fibre Channel directors available from vendors such as Computer Network Technology (CNT) and McData. Directors are high-end switches with built-in fault resiliency (or redundancy) features that can provide 99.999% uptime availability.
Figure 2 illustrates a highly available, scalable fabric. It is a cross-connect design that can scale in terms of port density and fabric throughput. If more front-end ports are needed, you can add edge switches. If more performance is required, you can add a cross-connect switch.
Figure 2: A highly available, scalable fabric can be built with edge switches and a cross-connect design that can scale in terms of port density and fabric throughput.
There should be no single point of failure with N_PORTs that are attached to the fabric. Redundancy is the key element. All devices must have redundant field replacement units (FRUs) so that component upgrades do not affect storage availability.
There are two general types of SAN disk storage: RAID and JBOD. RAID, or controller-based storage, provides a good way to keep "logic" and LUN management local. With controller-based arrays, SCSI targets can be presented with multiple LUNs, all behind a single F_PORT. LUN limitations normally do not present difficulty in server-attached storage environments, but with SANs they can be troublesome. RAID arrays offer other beneficial features such as online code updates, automatic RAID and failover functions, data replication, cache, and event management at the hardware level.
JBOD (or FC-AL storage) is well suited for application storage consolidation, where disk resources are divided among several application-specific hosts. JBOD presents multiple SCSI targets, each having one LUN, making it an effective long-term solution if it has native dual-ported Fibre Channel drives.
Health and general welfare
The storage infrastructure must have management tools to facilitate easy allocation and management of storage resources. Storage resource management (SRM) software provides policy-based event and performance management that elevates the scope of management to higher-level attributes. SRM features include centralized views of physical and logical storage resources, device status monitoring, capacity allocation, performance management, and asset inventory. (For more information on SRM, see the Case Study, "User guidelines for network storage policies," on page 76.)
When you can remove the management burden of storage, you remove the biggest reason to outsource it. If storage management were not a problem, no one would sign up for a three-year lease for a commodity (disk drives) that is rapidly decreasing in value. With the cost of disk drives falling (recent vendor announcements list 100GB drives for less than $100), the problem no longer is not having enough storage but, rather, having too much of it to manage.
At a minimum, storage management must provide notification that something is broken, and an indication of what caused the problem. The most common difficulty is differentiation between failed connections and failed devices in the fabric. SRM tools help solve data placement and transport problems. Data placement problems include LUN management, RAID levels, partition space thresholds, backup sizing, and caching policies. Data transport deals with problem isolation and detection, performance measurement, and load monitoring and leveling.
SRM software provides a single point of management for LUN placement, device and interconnect path configuration, and zoning. Disk storage vendors generally provide these management functions for use with their products. However, hardware-independent products are available from vendors such as BMC, Computer Associates, HighGround, and Veritas. With the cost of managing storage spiraling skyward, companies must decide to either manage their storage resources or outsource them.
Point-in-time saves nine
The cost of data recovery has become prohibitive due to today's online storage availability requirements. The most important issue is not a matter of how fast backups are, but, rather, how fast recovery is-as long as neither affects application server performance.
LAN-free backup and third-party copy technology are now available for SAN-based backup and recovery. However, database and file system backup and recovery times can be reduced without much effort using point-in-time (PIT) copy technology. PIT images (or snapshots) are virtual replicas of data, without the requirement of physical copying.
One approach for creating PIT images is through RAID arrays at the controller level-an approach referred to as a "triple mirror." This approach requires an additional volume to be associated with the master data volume (which may be RAID 1 mirrored, hence the term triple mirror). After the mirror breaks, the new mirror contains a snapshot of the master data. It can then be mounted as a volume on alternate servers (as long as they are attached to the same array; otherwise, data replication is required) allowing it to be backed up without impacting performance on the production server.
Point-in-time copies can be generated with software. At the volume management level, full volume copies are created. This affects I/O performance on application servers. At the file system level, a virtual copy (not a physical copy of the data) is generated via a copy-on-write technique used to preserve the illusion of a master volume. All writes to the master are copied first to the PIT image, thereby preserving the original contents. By using only pointers for the original data blocks, and copying only the changed ones, a virtual point-in-time volume is created. The advantage of this approach is that the PIT image generally only requires 20% to 30% of the original volume size, and it can be re-mounted by alternate servers for backup/recovery.
By capturing frequent snapshots, or storage checkpoints of online data, files and databases can be kept in production while images are backed up in the background on the SAN without affecting application server performance. Point-in-time images can additionally be checked for validity before data is actually backed up to tape. Disaster rehearsals can be performed to ensure that data can be restored from tape or that the business can be moved to alternate data processing sites.
Database servers have similar technology for capturing point-in-time copies by putting tablespaces in archive mode. However, performing snapshots at the storage system level removes the undesirable side effect of having multiple tablespaces simultaneously in "online mode" for a hot backup.
Block-level incremental backups (BLIBs) are another solution. Rather than backing up entire files that have been modified since the last backup, only the data blocks that have been changed are backed up. This approach again puts a load on the application server for backup/recovery.
To restore data, the master volume must be re-synchronized with the PIT image. The differences between the master and PIT image are tracked by software that enables selective refreshes without performing entire disk-to-disk copies. In most cases, only the changes from the last established point-in-time will need to be copied to the master volume. On average, no more than 5% to 10% of data on file servers changes daily, permitting multiple snapshots to be taken throughout the day. Snapshots can eventually be taken from online storage to tape via LAN-free or third-party copy techniques. Most RAID array vendors have point-in-time technology, and hardware-independent approaches are available from vendors such as CrosStor, Legato, and Veritas.
Storage virtualization is a major step in delivering the true promise of SAN technology. Virtualization enables online storage to be parceled out as virtual SCSI disk groups, consolidating multiple heterogeneous storage devices (RAID, JBOD, tape) behind a single management console. For example, virtual RAID-3 storage could be allocated to volumes to store video information, while lower-cost RAID-5 virtual storage could be used for transactional applications. This type of virtual storage obsoletes physical LUN management, replacing it with attribute-based virtual disk controls, which simplifies storage management.
Using block-level mapping techniques, storage virtualization presents servers with logical views of storage in the form of virtual disks, while storing data blocks on physical storage devices in a way that is transparent to servers. In theory, this allows storage to be allocated based on price, availability, and performance. JBOD can be divided into hundreds of volumes, each having its own caching policy. Multiple JBOD or RAID arrays can be combined into single, large volumes. Virtualization allows storage (regardless of vendor) to be re-purposed for use on the SAN, in essence providing uniform "volume management" across the enterprise.
Storage virtualization products on the market today are either pre-packaged solutions (SAN appliances) or software. Vendor examples include Compaq, Datacore Software, DataDirect Networks, Gadzoox Networks, StoreAge, StorageApps, TrueSAN, and Xiotech (Seagate).
The Gartner Group IT consulting firm, in Stamford, CT, defines two methods for implementing storage virtualization: symmetrical pooling and asymmetrical pooling. Symmetrical pooling puts the abstraction layer directly in the SAN data path, between storage devices and servers. A server, or storage domain controller, owns all the SAN storage and dishes it out (via cache over multiple HBAs) to clients. The domain controller houses the cache, storage pooling, and event management.
Asymmetrical pooling is when the storage abstraction is outside of the data path. With this approach, "appliances" control the overall storage virtualization process through the use of client-based software that maintains the virtual disk mapping tables.
With either approach, the virtualization process provides real-time translation of virtual disk addresses to physical LUN block addresses. Without this storage abstraction function, storage management will continue to be a problem.
Virtual storage solutions are a combination of software, hardware, and professional services-often from a storage integrator-used to build and deploy the new storage architecture. Once in place, however, these solutions allow companies to lower the cost of storage management and to react more quickly to their storage requirements.
Companies need ease of administration in assigning storage to hosts, with the ability to grow and shrink volumes online across multiple operating systems. These SAN strategies will help you manage and deploy storage resources more effectively.
Mark Teter is director of storage solutions at Advanced Systems Group (www.virtual.com), an enterprise computing and storage consulting firm in Denver.