Tape- vs. disk-based backup, part II

In last month's column, you claimed that a well-designed tape system should out-perform disk-based backup systems. Can you explain what goes into designing such a tape system?

Let me first clarify the claim that I made in that column (see "Is disk-based backup all that it's cracked up to be?", July 2003, p. 24). I was addressing a reader who wanted to know if staging his backups to disk would improve performance over sending data directly to tape. He had tried disk staging before and got worse performance. I explained that the problem with most backup systems is not that tape drives are slow; rather, the bottlenecks are elsewhere and merely changing the storage medium will not make the problem go away.

Start at the application, not at the storage device—When customers call me looking to beef up their backup systems, their attention is typically focused on tape drives and tape libraries. Most recently, they have been asking about ATA disk arrays. This is the wrong way to approach the backup/restore problem. Instead of focusing on back-end storage devices, you should begin the process by examining the computers you want to back up, the applications running on those computers, and the makeup of the data to be backed up. Once you figure out how to get the data off these computers and how you are going to transfer it to the secondary storage devices, then you are in a position to go shopping for tape drives and disk staging products.

Client-side problems to look for—You will find that the most difficult challenges are on backup clients that have lots of small files or that run CPU-intensive applications. Large numbers of small files require disproportionately more CPU and I/O resources to back up than a comparable capacity stored in larger files. Small files choke the backup client and reduce the data rate to a trickle. You can have the fastest networks, the fastest backup server, and the fastest tape drives, but if your backup client cannot deliver the goods, your performance will not improve. In these cases, a good solution is plenty of RAM (for local file system caching) and a TCP offload engine (TOE) card. TOE cards offload TCP/IP processing from a host CPU and can dramatically improve performance.

The backup server—The most common bottleneck is the backup server itself, which has to receive streams of data, reassemble the streams into files, write records into its indexes, and send the data to the tape media. That's a lot of work. Now imagine that multiple clients are sending data at the same time. The bottleneck is going to be I/O processing on your backup server. You can add all the tape drives you like, but if you cannot deliver the data to the tape drives you won't see any performance benefits. Similarly, a super-fast disk device is not going to improve performance. If anything, the local file system overhead associated with writing backups to disk might slow your performance even more.

You can address the backup server bottleneck in several ways. One is to spend a lot of money on a high-performance server. This is the way we did it in the past, but we still hit a bottleneck at some point; now there are better and more cost-effective alternatives. Another possibility is to install a TOE card in your backup server, assuming that your server's operating system supports TOEs.

A better way around the backup server bottleneck is to use a storage area network (SAN), and you can use the SAN in two ways: 1) by enabling multiple backup servers to split up the network load while still using a single shared tape library, or 2) by enabling individual hosts to back up directly to a tape drive using the SCSI protocol, rather than TCP/IP. (This latter approach is often referred to as "LAN-free backup.")

I recommend investing in software that allows for centralized metadata (backup system indexes or logs) with distributed I/O processing. In other words, you have a central backup server that orchestrates the backup process while other "slave" servers aid in the collection of data from the LAN and, in turn, send data to a shared tape library over the SAN.

Now the challenge is to decide which hosts to back up over the SAN and which to back up over the LAN. The most common mistake is to use the SAN for backup just because it's there. SAN backup has its complexities, not to mention the premium you pay for the software licenses. It is also not necessarily a good practice to use the same Fibre Channel host connection for both disk and backup, which introduces complexity. It is very common for us to use the LAN for backup even when the disks are on the SAN. Similarly, it is common for us to use a SAN for backup even when the disk is directly attached.

Use SAN or SCSI connections to back up systems that regularly back up a lot of data. (100GB or more per night is a good rule of thumb.) Also, use the SAN for systems that are CPU-intensive or that require particularly high restore rates. Use the LAN for everything else. LAN backup is much easier to administer and has far fewer headaches than SAN backup. You will be surprised at how good LAN backup performance can be.

Now that you have solved your data delivery problem, you are ready to choose your hardware. If you're making heavy use of the LAN, I would recommend using a helical scan tape drive such as Exabyte's or Sony's AIT and SAIT drives. These drives do not "shoeshine" and thus perform great on the LAN. If you are making heavy use of the SAN, you should consider the fastest drives on the market, such as SDLT, LTO, and SAIT. If you are doing an even mix of LAN and SAN, I would recommend SAIT drives (which perform great under both circumstances) or disk staging of LAN backups before writing to tape.

In short, use the right tool for the job, bearing in mind that over time the jobs and the tools might change. It is very possible that a host you back up over the LAN today will need to be switched to the SAN tomorrow, or vice versa.

Jacob Farmer
Cambridge Computer
Click here to enlarge image

Jacob Farmer is the CTO of Cambridge Computer. He can be reached at jacobf@cambridgecomputer.com.

If you have a question you would like to ask one of our experts, please e-mail Heidi Biggar at heidib@pennwell.com.

This article was originally published on August 01, 2003