Demystifying 'server-free' backup

Posted on November 01, 2001

RssImageAltText

To understand and implement server-free backup, users must be able to coordinate disk, tape, dedicated servers, networking infrastructure, and software.

BY JOHN WEBSTER

On the way toward becoming SAN's killer app, server-free backup has encountered roadblocks-the result of confusion, mismanagement, and missing pieces. Server-free has grown exceedingly complex for users because it's not just a matter of backup, but of recovery, too. Understanding and implementing server-free backup require users to coordinate many parts, including disk, tape, dedicated servers, networking infrastructure, and, last but not least, software.


JOHN WEBSTER
Click here to enlarge image

To untangle the undergrowth of LAN-free and server-free backup and restore functions, the processes need to be separated into their components and only then examined. But what are the components? What is the state of development of each? And how do they fit together?

Traditionally, the backup data stream has been routed from disk storage through an application server and then onto storage media via either direct or LAN connections. This process necessarily interrupts the application environment with either planned outages or a temporary reduction in available host CPU cycles during backup. The traditional scheme can also suck up available network bandwidth. Reversing this process (restoration) creates the same issues. The process poses essentially the same liabilities for direct-attached storage (DAS), storage area network (SAN), or network-attached storage (NAS).

The state of the art in backup/restore addresses these issues in two ways: first, by creating a point-in-time copy ("snapshot") of the data to serve as a source for the backup data stream, rather than using the primary production data set and, second, by eliminating the routing of the backup data stream through the application server and then over the LAN to tape.

Using a copy of the data frees up production data sets during the backup routine, leaving servers to do what they were originally intended to do-serve data to applications. StorageTek's Snapshot Copy and EMC's TimeFinder are examples of technologies with this type of copy capability; however, both have traditionally been proprietary in nature, in that they only work within the context of solutions from those vendors.

Unfortunately, creating internal copies of data only begins the backup process. The copy of data, which typically resides on a disk array, must be moved to archival storage before a backup can be considered complete. (In some cases, if a user prefers to keep the backup copy essentially "online," the backup copy continues to reside on disk but probably in a different physical location.)

Recently, a standard for moving a copy of data from disk to an archival storage platform has been proposed by the Storage Networking Industry Association (SNIA). The proposed methodology-"third-party copy"-adds a standard copy command into the SCSI command set. Third-party copy creates a backup data stream from one SCSI device (source) to another (target). A SCSI device can act as the data mover that copies data from disk to tape.

The second way to address backup bandwidth and availability concerns is to eliminate the routing of the backup data stream through the application server and then over the LAN to tape. SANs are dedicated networks exclusively for storage traffic. They provide the needed plumbing to take the backup data stream off the LAN and onto a separate network. The third-party copy command can then use the SAN plumbing to move data from disk directly to a SAN-attached tape drive, leading to greater server application availability and free bandwidth, but at the cost of building an independent network, the SAN, or at minimum, a separate LAN dedicated to backup/restore traffic.

When a non-disruptive copy of data is created for use as a source for a backup data stream and the application server is eliminated as a conduit for the backup data stream, the setup is often called "server-free" backup. However, there is a second use of the term "server-free" that is important.

Server-free essentials

To better understand the terminology and concepts at work in this second server-free backup definition, let's look at Legato's Celestra. Industry veterans have been closely monitoring Celestra's progress for many months. Celestra, which Legato acquired along with its creators (Intelliguard) in 1999, is based on two emerging open standards for backup/restore: NDMP and SNIA Extended Copy Command, which is expected to be released sometime next year. Celestra is also supported for use with databases and file systems. By examining the functioning of Celestra, we get an idea of how these standards work together.

A block copy engine is responsible for moving data bidirectionally between the source and backup devices, typically disk and tape, respectively. Backup-and-restore data streams move through only one intermediary-a block copy module. However, controlling this relatively simple process requires the coordination of several different, yet concurrent, processes with Legato NetWorker.

  • A backup is started from a console within a storage management software application (e.g., Legato NetWorker).
  • NetWorker communicates with a Celestra Manager Module using NDMP. (As such, it is therefore possible for other management applications that support the NDMP command library to use Celestra; however, Legato has chosen to market Celestra only in conjunction with NetWorker.) The Manager Module also coordinates data-set synchronization and initiates both backup-and-recovery operations.
  • A Sync Module flushes database or file system buffers and quiesces normal operations at the same time that a static point image (or "snapshot") of data is created. According to Legato, this process normally takes no more than a few seconds.
  • The checkpoint module that created the snapshot image (a copy of metadata, not the actual data) issues a command to copy data from the source device (disk array) to a cache region within the Block Copy Module.
  • During the copy process, a Write Interceptor Module intercepts write operations to the data set that is being copied and executes them in cache.
  • A Backup Image Generator determines which blocks within a data set actually need to be backed up. Typically, only the blocks that have changed since the last backup operation need to be backed up. Celestra can also distinguish between data blocks and empty space, copying only the data.
    [Clearly, the need to quiesce, snapshot, and intercept writes means Celestra must have intimate knowledge of the underlying file system or database. Currently, Celestra works with Solaris and HP-UX file systems and Oracle (7.3 or 8i) databases.]
  • A Block Copy Module moves data between storage devices. In the case of a data set that is being backed up from tape to disk, the Block Copy Module executes the copy command.

In this sequence of processes, the separation of functions between NDMP and SNIA Extended Copy Command ("third-party copy") is clear. NDMP is a standard command and control protocol used by the storage management application to communicate with the processes that are responsible for non-disruptively creating the data copy, moving the data copy to a backup device, and ensuring that the backup copy of data is consistent with the file system or database image as of a given point in time.

NDMP, the control protocol, is distinct from the third-party copy command, which is responsible for creating the copy of live data and moving it to the backup device.

Another deviation

Currently, Legato implements its third-party copy command logic in a dedi-cated Celestra Data Mover Workstation (DMW). Herein lies the second derivation of the term "server-free."

The DMW can be thought of as a dedicated backup server insofar as its sole purpose is to act as a conduit between disk and tape for the purposes of backup and recovery, and so it is separate from the application server environment. Nevertheless, it is still a server.

Legato plans to release a version of the DMW logic that resides within a SAN fabric switch rather than a dedicated server. In this case, both the application server and the dedicated backup sever have been removed from the backup data stream-hence, a second echo of the term "server-free" backup.

To reiterate, the DMW can link disk devices directly attached to application servers with tape libraries directly attached to the DMW (see figure, left).


Data moves directly between the disk, DMW, and tape, bypassing the application servers during both backup and full-image restores (left). Data moves directly between the disk, DMW (via the SAN), and tape, bypassing the application servers during both backup and full-image restores (right).
Click here to enlarge image

Alternatively, the DMW can attach to the SAN fabric. In that case, the backup data stream flows from a SAN-attached disk array through the SAN fabric to the DMW and then to the DMW-attached tape (see figure, right).

We believe server-free backup remains an untapped opportunity because of its complexity and cost. To qualify as a solution with actual cost-benefit, one must take into consideration multiple elements from multiple vendors. In a Fibre Channel SAN environment, that is no trivial task. One must resolve interoperability issues among host operating systems, the storage resource management applications, Fibre Channel fabric devices, and, finally, the storage devices themselves. This complexity drives entry-level price tags of server-free backup solutions into the $300,000 realm. Server-free is a long way from free. The dollars and difficulty presently limit server-free backup solutions to high-end enterprise users.

So far, server-free backup has not yet proven to be the "killer app" for Fibre-Channel-based SANs that the industry once imagined it would be. However, it could very well turn out to be a key driving force behind the implementation of IP-based SANs or hybrid FC and IP SANs as an alternative to pure Fibre Channel.

There is also good reason to believe that users will employ iSCSI as a means of reaching out to both servers and workstations within the network that would otherwise be too expensive to reach via Fibre Channel. In that scenario, Fibre Channel becomes the storage networking backbone and IP, the interconnect between that backbone and the rest of an enterprise's server complement. The relatively high cost of connecting each to Fibre Channel is thereby eliminated as an inhibitor to SAN growth, allowing IT management to extend server-free backup techniques from the data center out to departmental LANs.

The future of server-free backup remains somewhat in doubt, as other backup-and-recovery techniques that emphasize the recovery side of the equation become better known. As SAN virtualization becomes more widespread, faster recovery techniques from copies made directly to disk rather than tape will also become popular. Nevertheless, tape-based backup and recovery will remain the first line of defense for many enterprises for at least the next few years, giving server-free backup a place on the runway. The advent of IP storage and iSCSI may finally help get it aloft and flying.


John Webster is a senior analyst with Illuminata, a research and consulting firm in Nashua, NH. For more information on this subject, visit www.illuminata.com.


Comment and Contribute
(Maximum characters: 1200). You have
characters left.