What`s Next for SAN Management?
In the second of a two-part series, we look at emerging management strategies for storage area networks.
By Scott McIntyre and Paul Zuhorski
The desire to consolidate and share data, move data between applications and file systems, maximize ROI, centralize management, cluster servers, remotely vault data, and increase storage capacity and performance are all creating connectivity and availability demands beyond the capabilities of most storage environments.
The emergence of Fibre Channel provides a way to meet these demands by applying networking concepts to storage connectivity. In these environments, storage management is a primary concern. Storage area networks (SANs), in particular, raise new management challenges and possibilities because they must be, in effect, self-managing.
Last month, we examined how data protection can be handled in a SAN today. This month, we explore the future of SAN storage management.
One of the compelling potential benefits of SANs is the ability to manage storage in a way that reduces traffic on production servers. This is most obvious in the case of backup. Last month, we demonstrated how backup traffic from production servers can be off-loaded from the messaging network. The next step is to make the backup task almost transparent to production servers and applications.
Consider a backup server that has a view of all storage on a network (for example, a large database server). The goal is to back up the database without involving the database server. To do this, you need to serialize access to the database`s data. While there has been discussion of distributed lock managers and universal file systems, viable implementations are not in the foreseeable future. So, other solutions must be found.
One solution is a shared-storage clustering approach, whereby both Node A and Node B of a cluster have access to the database, understand its structure, and know what updates have been made. It is possible to install a backup "data mover" and a database backup interface module on one of the nodes. The database is backed up from Node B while Node A continues database operation. Because of the added hardware and software costs associated with a clustered implementation, this technique is relatively expensive. However, for users who have already implemented clustering, this type of backup may be attractive.
A second possibility is to handle the backup via integration with mirror and snapshot techniques provided by the host system or disk hardware, such as EMC`s TimeFinder and Symmetrix Remote Data Facility (SRDF). In this implementation, the database is briefly put into backup mode while the TimeFinder Business Continuance Volume (BCV) is created or the SRDF mirror is broken. The BCV or mirror volume can be made addressable to a separate host. The database resumes production mode while the BCV or mirror is backed up. While this is not strictly a SAN technology, it is enabled by the Symmetrix capability of supporting multi-host connectivity--one of the defining features of a SAN.
A third possibility is backing up via extensions to the database`s existing backup interfaces. Pointers, which identify data that needs to be backed up, are passed on to a separate backup server. Though this type of implementation requires the database to remain in backup mode for the duration of the backup, it does takes the database server out of the backup data path.
While these implementations take the database server out of the backup path, it is important to note that they do not take the server completely out of the backup process. The database (or application) owns the data; therefore, it must be involved in the backup process to ensure a consistent image of its data and to manage contention for its I/O resources.
Each of these examples addresses database backup. However, similar considerations apply to file systems and other applications.
Direct Disk-to-Tape Backup
An extension of the backup techniques described above is direct disk-to-tape backup. This process is automatic and does not involve server resources, except to collect index information associated with the data. However, because data doesn`t move on its own accord, a data movement engine is needed to move the data from disk to tape.
This concept raises several questions. First, where does the data movement engine reside? If not resident on a dedicated backup server, it could be a function of a storage controller or intelligent storage server, an intelligent switch, or even disk drive firmware. All of these options will likely emerge.
However, for a data mover to be truly effective, it must understand the objects (i.e., the data associated with a particular file system or application) stored on disk and the data structures they contain. Objects must be automatically discovered when created so that differential and incremental backups are properly handled and indices are created for the objects.
The data mover must be able to dynamically launch multiple streams (i.e., spawn multiple processes) so that the data can be multiplexed to keep tape drives streaming at maximum speed. In addition, communication with the applications that own objects on the disk array is required in order to back up the objects in a file system- or application-aware manner for a consistent snapshot of the data.
Further, each block of data written to tape must be self-describing to prevent unprivileged users from recovering data and to ensure file recovery in the event of a disaster. Further, this allows the backup application to automatically skip bad blocks on a tape, to recover good blocks, and to recreate indexes by reading the backup tape. Unlike tar formats and their variants, self-describing tape formats can accomplish all of these functions.
Regardless of where the data movement engine resides, the second question is, How is data movement and associated metadata managed? Management is obviously a higher-level function--clearly, it won`t take place in disk-drive firmware. There must be a centralized management point that can aggregate management data from the data movers spread across the SAN fabric and from other storage management applications.
Exploiting the intelligence that is distributed across the storage network requires an architecture that separates data movement from data management. As the distribution of processing capabilities in the storage network becomes clearer, data movement functionality can be migrated into the fabric of the storage network so that data is moved among storage resources without involving production servers. For this to happen, data-sharing issues must be resolved. Data movement will apply to more than just traditional backup, it will also apply to data protection methodologies such as snapshots and continuous versioning.
The first two questions relate to a larger issue: Where does the intelligence, hence the management capability, reside in a SAN? Many of the components of the SAN (including storage controllers for disk subsystems, some tape subsystems, and networking hardware components) will have processing capability. Nearly all will have a view of the network topology, and most will have some sort of management interface. All will like to claim the title "storage network manager."
However, intelligence and management capabilities will likely be widely dispersed in the storage network, requiring a high-level management function that can aggregate features that are spread across the network.
SAN management challenges aside, what about the challenges of managing storage network devices (i.e., the hubs, switches, routers, and bridges that make up the SAN fabric)? Topology and configuration management, performance management, fault isolation, and alert monitoring and management will all be required. In time, industry-standard implementations of SNMP and/or common information model (CIM) protocols will likely emerge. In the meantime, however, the various vendors of SAN interconnect hardware products will provide software that monitors and manages their own products.
The evolution of distributed computing storage environments to SANs will result in significant user benefits, including more efficient use of storage resources and, eventually, greatly improved access to shared data. Deriving benefits from SANs will demand storage management software with a robust, highly scalable architecture.
As SAN architectures develop, backup will be handled within the storage network, and continuous data protection methodologies will be employed that are transparent to production servers. Pulling it all together will be robust policy management and HSM for the various applications that manage storage, and network and storage devices, in the storage network.
Direct disk-to-tape backup does not involve server resources except to collect index information associated with the data.
The Role of HSM and Policy Management
In a SAN, "locational transparency" will become increasingly important. Data will be spread across the network, in many cases physically located far from the servers. The connection between physical storage and logical storage will be less obvious. The fact that all the storage on the network is addressable will enable, and create demand for, the most cost-effective use of those storage resources--an ideal environment for hierarchical storage management (HSM).
HSM technology, which migrates infrequently used data to slower secondary storage based upon predefined rules known as policies, has been widely used in mainframe environments for more than a decade. However, it has been slow to catch on in distributed computing environments, for a number of reasons: HSM is fairly complicated and difficult to implement; it can make users and administrators feel as though they`re surrendering control of their data; and it requires an operational discipline that many distributed computing environments lack. However, a more fundamental reason is that distributed computing environments do not have a storage hierarchy.
Mainframe environments, which have for years had the kind of connectivity that SANs are bringing to distributed environments, typically have a hierarchy of centralized storage. In these environments it makes sense, and may be an economic necessity, to migrate old, unused data down the storage hierarchy and reserve expensive DASD for heavily used data.
In contrast, most distributed environments today do not have a storage hierarchy. Typically, a server only has disk subsystems attached to it, and the only servers with access to tape are those used for backup. The price of attaching dedicated optical or tape resources to production servers often outweighs the cost benefit of migrating older data to that secondary storage, and migrating data from large production servers to an HSM server over the messaging network can place a prohibitive load on the network.
However, SAN connectivity changes the economic equation. Tape and optical resources are available to all of the servers on the network, creating a storage hierarchy. In this environment, HSM becomes possible and even economically feasible.
One of the enabling technologies for HSM is policy management. However, policy management is applicable to storage environments beyond HSM and is likely to be one of the keys to simplifying management of large SAN environments.
Policies are the standard operating procedures that administrators define across heterogeneous and geographically distributed environments. Policies define functions such as standard client and server configurations, failover policies, and error and usage data collection and reporting.
Scott McIntyre is business line manager for storage networking and Paul Zuhorski is product manager for storage networking at Legato Systems, in Palo Alto, CA.