by Heidi Biggar
Tivoli tests show that data movers (i.e., bridges or routers) are less available and less robust than application servers are.
Sabrinath Rao -- Computer Associates
RAO: We've tested the stability and viability of third-party copy, as have our partners, including Brocade, Chaparral, Crossroads, and Pathlight. The solution seems to be well accepted by customers.
McINTYRE: This may or may not be true-it depends on the bridge or router in question. Some of these SAN devices are enterprise-class devices, which are based on embedded server architectures and use server-class processors and memory.
These SAN devices are quite reliable. However, unlike servers, they cannot typically be clustered for high availability, so fail-over may still be a concern.
Still, the failure of one of these devices is no different from the failure of a tape drive during backup. The backup fails and can be restarted using a different device. Moreover, given that tape drives are electromechanical devices, we expect them to continue to be the single biggest source of backup failures, ahead of bridges or routers.
ADAMS: While Veritas would agree that the functionality residing in a data mover is not that robust in nature, we do not believe third-party copy is any less viable as a technology. In fact, a data mover (i.e., bridge or router) is not a necessary component in our implementation. Our Media Server can act as the data mover. Now, you may say this is not server-less backup, but when you think about it, it really is. You have a single server involved in the backup process rather than multiple servers. This is a significant resource savings-from a server CPU
According to Tivoli's benchmarks, in terms of doing all the processing required to re-block and re-format data, the third-party-equipped bridges/routers are slower than off-the-shelf Intel or Unix boxes in moving data between disk and tape.
RAO: ARCserve's server-less backup capability leverages our Image Snapshot technology, which bypasses such bottlenecks.
Scott McIntyre -- Legato Systems
McINTYRE: The second issue has to do with the performance of Extended Copy-enabled devices compared to off-the-shelf servers. Again, the answer depends on the bridge or router in question. Remember: These devices are built for the sole purpose of moving data and can be quite efficient for that task. What is important is to understand the performance characteristics of the boxes doing the data movement-bridges, routers, or servers-and to design and configure your environment accordingly.
ADAMS: I would like to see the numbers in this case, but speed is not the main benefit associated with this technology. We look at third-party copy as a way to reduce the "backup footprint" (i.e., CPU and I/O resources on the application or database server). When we demonstrated server-free backup at our end-user conference, we found that performance had increased over a traditional backup. While this still needs to be quantified further, backups over a dedicated network like a SAN are not crippled because they are not speedy enough. Our products push tape drives to the maximum transfer rates. Backup performance can also vary greatly due to a host of other factors.
Since the requesting backup application is removed from the data, code that was written to handle error recovery, error handling, etc., is no longer available, putting customer data at risk.
RAO: There are enough checks performed at the device level to ensure data integrity. The SNIA standards require that such checks be performed at the device controlling the data movement.
McINTYRE: I think the answer here is application-dependent and depends on whether the data mover is on a server or is an Extended Copy-enabled device. The Celestra Data Mover, for example, runs as an application on a separate server. Therefore, it can have as much error-handling capability as required to effectively process backup and recovery tasks.
I suspect that in this case, Tivoli's concerns have to do more with Extended Copy-enabled devices. On one hand, Extended Copy is just a SCSI command and returns error information just like any other SCSI command (the SCSI Write command that a traditional host-based backup application would use, for example) so that the application can perform error recovery. On the other hand, error handling and recovery is a critical part of any storage application, so we are paying particular attention to this in our testing of Extended Copy support.
MIKE ADAMS -- Veritas Software
ADAMS: Veritas is very aware that data errors could be a major issue when it comes to third-party-copy backups. That's why the NetBackup ServerFree agent has built-in intelligence to handle data integrity issues that may occur while data is in flight. The key to data integrity with this backup solution is accurate data mapping. Through integration with a file system (UFS or Veritas File System today), we are able to guarantee that specific data is backed up and recovered. The accuracy of mapping is paramount to achieving reliable off-host backup.
File and volume information needs to be decomposed to physical disk locations. This means a backup and recovery solution must be able to accurately map through the file system and volume manager to get the low-level physical information. Any mapping implementation must be able to compensate for logical re-organization (such as file system or volume re-organization and degraded RAID-5) to guarantee that data is correctly mapped and that the right data is being backed up. Veritas NetBackup ServerFree Agent has this type of intelligence.
To make copy devices work, you have to hold the file system steady while you make a copy. This process is neither as robust as nor as fast as existing replication techniques, making the value-add questionable.
RAO: Not when you use snapshot technology. Our internal tests show otherwise. Tivoli seems to be inclining toward "server-free," not server-less. Server-less requires a lot of expertise in handling image snapshots of file volumes.
McINTYRE: This issue has to do with getting a frozen image of the file system. In the case of Celestra, this is done with a snapshot technique, which is the same as that used in replication products, so we don't share this concern.
ADAMS: Our quiescing of the file system happens in a matter of seconds. Furthermore, we're able to track any changes during that time and can apply them to the application or database once it has been "snapped" and released by the snapshot driver. Why replicate when I can take a point-in-time copy of data very quickly? The value-add is that I can keep the file-system online. Downtime is unacceptable in today's IT world.