One of the biggest mistakes companies can make when it comes to disaster recovery or business continuity is doing nothing because of budget constraints.
"It is important to keep in mind that data backup and data protection is not a 'one-size-fits-all model,' " writes Paul Mayer, a product manager at Datalink.
Both consultants say users should look at their systems and applications in terms of their overall importance to the organization (is any downtime tolerable?) and how they tie into (or don't tie into) specific business needs.
If an application isn't critical, should it be assigned to high-end storage? Should it be backed up to disk, and, if so, how long should it be kept there? Likewise, if an application is deemed mission-critical, when is it safe to move it off disk and onto tape?
Matching technologies to business needs is not a simple task. Striking the right balance requires careful thought and a lot of know-how about a wide range of technologies.
In the first article, Mayer examines different types of technologies and how they can be used to meet specific business requirements.
In the second, Bullitt walks users through the process of establishing priority levels for various applications and then helps them match those needs to specific disaster-recovery strategies.
If you have any questions for the analysts or integrators who have contributed to this series, drop me an e-mail at email@example.com.
Coming next month: Is it a logical or a physical copy?, by Dianne McAdam, Data Mobility Group. – Heidi Biggar
By Paul Mayer
After you've completed a comprehensive data backup and protection assessment, it's time to map your business requirements to available technologies to create an optimized disaster-recovery and business continuity plan. It is important to keep in mind that data backup and protection is not a one-size-fits-all model, because technologies must be based on the specific needs of your organization. This article discusses some of the backup, recovery, protection, and management technologies available today. (For a full copy of the report from which this article was excerpted, visit www.datalink.com.)
File-level and block-level backups
How are file-level backups done?—During a file-level backup, the host processor makes numerous I/O and system calls to locate and open the header of each file for backup. This data is commonly fragmented across a set of disks, requiring a high frequency of scattered disk I/Os to accomplish the backup. This can cause a number of undesirable effects in environments with large numbers of small files, including server slowdown, poor tape performance, and accelerated wear and tear on tape heads and media due to shoe-shining.
How are block-level backups done?—The block-level backup process creates a bitmap image of file-system storage blocks. Once the image is established, the storage blocks represented by the image are moved to a secondary storage device without further referencing the file system. As data changes occur during the backup process, they are written to the production data volume, while the blocks that are to be backed up are copied to a cache area on the disk drive. When the backup application finds a changed block, it refers to the cache for the accurate point-in-time data. When the backup is complete, the cache is cleared. When applying incremental block-level backup technology, organizations can achieve increases in performance and tape utilization.
When to use block-level backup—Environments with large volumes of small to medium-sized files are good candidates for block-level backup since backups can be performed in shorter time periods with less of an effect on critical application and file-server processing. However, while block-level backup can improve backup performance, it can do so at the expense of recovery time when the recovery operation targets a subset of the files contained within the backup data set.
Removable media management
Tape as removable media—Tapes can be kept online or moved off-site for disaster-recovery purposes. For tracking purposes, indices of the removed tapes remain online. Some things to consider when developing a strategy for removing and tracking tapes include the following:
- Optimal tape library sizing—It makes sense to implement a library that accommodates regularly accessed volumes with room for growth, but it is overkill to store an entire archive of tapes that is either unlikely to be accessed or for which manual retrieval of tapes is adequate to meet set recovery time objectives (RTOs).
- Many organizations create a second copy of each tape to protect against failures. This is particularly common for life-cycle management and archive applications for which the data on tape is often the only copy. Software-based vaulting utilities are available to manage and track tapes as they move through their life cycles.
- Rotation of tapes to off-site storage for disaster-recovery purposes. As a foundation for disaster-recovery capabilities, off-site vaulting of tapes can ensure that data is available to rebuild a data center in the event of a site disaster.
When to use vaulting technology—As data volumes grow and backup operations expand, many organizations experience challenges and spend innumerable hours manually tracking backup tapes. Removable media vaulting simplifies the administration of tape import, export, and life-cycle data management.
Open-file backup software—This type of software uses a caching area on disk to capture changes that occur at the time of backup. It is a good solution for backing up files in high-transaction environments or in environments with a small or non-existent backup window.
With the widespread trend toward 24x7 operation, a common source of backup-and-recovery frustration is that certain files are frequently in use during backups, thereby compromising the integrity of the backup process. Open-file backup capability is integrated into many leading backup applications.
When to use open-file protection technology—Any organization that performs backups on file servers or workstations where there is any potential for files to be in use during the backup process can benefit from this type of technology.
LAN-based backups—While LAN-based backup is a good method if your backup activity can be contained within an acceptable backup window and the LAN is not used by other applications during backups to prevent network saturation, it not a good option if your backup window is shrinking or disappearing.
LAN-free dedicated TCP/IP backup network—There are a couple of prevalent strategies to minimize traffic on the LAN during a backup. One strategy is to implement an additional TCP/IP network that is dedicated to the task of transferring backup data to shared backup storage subsystems. Most backup software products can operate transparently in this type of environment.
LAN-free backup over a SAN—Another strategy for minimizing backup-related traffic on the LAN is to implement a storage area network (SAN). In addition to isolating backup traffic and preventing network saturation, SANs generally provide higher data-transfer rates than traditional LAN topologies, which can reduce the time spent backing up data, leaving host systems available when needed.
What about serverless backup?—Serverless backup utilizes a third-party data mover to copy data from disk to tape, thereby relieving the data server of this I/O burden. While many technologists believed that this approach would have become the mainstream de facto method for moving backup data by now, it has actually been more frequently positioned as a niche solution for narrowly defined data-recovery requirements. Organizations may want to consider implementing serverless backup technologies for high-volume OLTP environments with extremely large data volumes and impossibly tight backup windows. However, the implementation, tuning, and administration of serverless backup is very complex and should only be implemented when more established backup methods do not meet the backup window requirements.
Point-in-time off-host backup
How does disk mirroring technology work?—Many storage infrastructures include component redundancy in the form of disk mirroring (RAID 1) to improve the reliability and availability of primary storage systems. Given the declining costs of disk drives, organizations are opting to use mirroring technology to augment their backup processes. By deploying an additional mirror, a third copy of the data can be separated from the primary and the first mirror for backup operations, providing a point-in-time copy of data that is separate from applications and user data. A separate server can back up this copy without adversely affecting production on the application server.
Third-mirror management—Third-mirror management can be performed by either hardware- or software-based utilities and has similar benefits to backup operations. One advantage of using software-based volume administration to manage the point-in-time copy of data is that it allows organizations to use inexpensive storage media for the third mirrors, which means they can invest in cutting-edge RAID technology for production storage and less-expensive storage subsystems for the third mirrors.
File-system data snapshots
How does snapshot technology work?—Snapshot technology creates parallel read-only file systems that point to a set of data intermingled with live production data. Creating file-system snapshots takes only seconds and has minimal impact on the system. Snapshots are stored as small files on the live file system. The data that exists at the time of the snapshot is protected from being overwritten on the physical disk so that it can be referenced from the snapshots. This enables consistent static access to files at an identified point-in-time, which offers significant benefit to both backup-and-recovery processes.
Snapshot data edits, additions, and disk-space requirements—Data edits and additions are written to a new area on the disk, which means snapshots do not require nearly the same incremental disk space required for point-in-time data copies (split mirrors). Disk-space requirements depend on how long the snapshots are kept and the refresh rate of the data. It is important to manage and cycle snapshots so that unneeded disk space can be made available to the live file system.
Snapshots used as a part of the backup process—Snapshots of data can be taken to create a consistent point of quick rollback for cases of inadvertent changes, deletions, or corruption, or to establish a solid point-in-time reference to a live data source to assist backup operations. When snapshots are used as part of the backup, a snapshot of the data is taken before the backup process begins. The host then mounts the read-only snapshot file system for backup purposes.
Snapshots used for data recovery—For recovery, snapshot file systems may be referenced to restore files that have been corrupted or inadvertently deleted. In many environments, snapshot technology is used for up to 90% of file recovery. This approach to recovery improves performance, simplifies administration, and complements traditional backup technologies.
When to use snapshot technology—Snapshot technology brings consistency to backups offered by a point-in-time file system that cannot change during backup operations. Snapshots offer significant benefits to data-protection operations at lower price points, particularly on file servers and network-attached storage (NAS) platforms. For complex databases, it is generally preferred to use DBMS tools to establish point-in-time rollback capabilities so that pending transactions can be fully applied. This approach provides a solid rollback point for recovery.
How does disk-to-disk-to-tape (DDT) backup work?—By using disk as part of a data-recovery storage infrastructure, an IT operation can improve its abilities to meet its backup windows and recovery time objectives. There are two general approaches to DDT backup:
- Backup to disk—Traditional disk storage, often ATA-based, can be configured as the target for backup data. While this approach can yield backup performance benefits in some environments, its primary benefits are backup reliability and recovery performance. When backing up to disk, it is generally recommended that a copy of the backup data be written to tape or replicated off-site, for disaster recovery and archive purposes.
- Tape emulation—Another approach to introducing disk into the backup infrastructure is to use an emulation technology as a front-end to the disk system, thereby presenting disk to the backup application in a way that makes it appear as tape. This approach can yield a couple of benefits over traditional backup to disk. First, it enables seamless integration of disk into the backup operations, versus traditional disk, which generally requires some re-engineering of the backup workflow. Also, measurable performance gains can be realized (versus traditional disk) in the backup process as data is transferred to disk sequentially, and in large blocks, without file system overhead and fragmentation typically associated with disk.
Backup to disk can also provide a foundation for performing synthetic full backups, which allow organizations to effectively perform a single full backup followed by ongoing incremental backups. Synthetic full backups are accomplished by merging the full backup with the incremental backups behind the scene, resulting in an updated full backup.
This eliminates the need to perform regular full backups, which can consume tremendous bandwidth and bring production to a grinding halt.
When to use DDT backup—DDT backup can give an organization a greater ability to meet its recovery time objectives, particularly when individual or small file sets are regularly recovered. DDT backup also offers benefits in environments where backup is performed over low-performing, or saturated networks, where tape streaming is difficult to achieve, or where interleaving must be used to stream the tape drives. In these environments, the data will be accepted by the backup disk at whatever speed the data is delivered, eliminating the shoe-shining effect that many midrange tape drives would experience, and thereby greatly improving the throughput of the overall backup system. Advantages can also be achieved on the high end of the performance spectrum, where DDT backup offers greater flexibility for how data is configured on the primary storage, such as in database systems, and delivers performance that is comparable to or greater than tape with less ongoing tweaking and tuning once the system has been optimized.
Data replication technology—Data replication creates a secondary copy of data, generally for the purpose of disaster recovery. Software-based and hardware-based replication technologies each have strengths and weaknesses.
By using hardware-based technology for replication, data transfers can be performed without impacting the application or file servers. On the other hand, software-based replication better integrates with the file system and applications, allowing improved data consistency, which is particularly critical for database applications.
Furthermore, the additional host cycles for software-based replication are generally considered nominal for today's high-performance servers.
Data replication and data protection—While data replication provides benefits generally associated with disaster recovery, the replica data can also be leveraged for general data-protection purposes. Some organizations perform regular backups on the replica data, which yields the additional benefit of off-host backup. This approach can also be used to centralize the backup operations of multiple facilities.
When to use replication technology—Replication technologies are largely implemented as components of a business continuance, disaster recovery, or high-availability strategy. Replication can be leveraged for backup purposes by providing faster restore times from storage system failures or providing off-host backup capabilities.
Centralized backup administration
Centralized data protection—In an enterprise-class backup environment, one important consideration is to maximize administrative efficiency and minimize the human-resource time devoted to planning, scheduling, and administrating backup systems. One common characteristic of large-enterprise data storage environments is data being housed at multiple physical locations.
Steps can be taken to incorporate all data repositories into a centralized data-protection scheme, whether physically or virtually, to increase IT productivity and efficiency.
How does backup centralization work?—Backups can be physically centralized by moving all the remote data streams to a common backup infrastructure, or virtually by using sophisticated hierarchical software technology for backup administration.
- Physical centralization requires adequate bandwidth to facilitate the data throughput necessary to meet the defined backup window and to meet the recovery time objectives of the remote locations. In the absence of substantial bandwidth for recovery operations, a defined and tested procedure must be implemented to ensure remote recoverability in the centralized backup architecture.
- Virtual centralization software provides a global view of the data-protection infrastructure from a single access point and can manage the distributed backup products throughout the enterprise. In some cases, this approach allows users to assign varying degrees of control to backup administrators and operators based on minimal requirements for localized control in their environments.
When to use centralized backup administration technology—While organizations continue to experience dramatic data growth, centralization becomes an imperative theme in virtually every organization. Backup vendors offer varying degrees of centralization; therefore, this type of technology should be considered at some level in most organizations that have decentralized data or fragmented backup operations.
Bare-metal restore—As applications and servers evolve, IT personnel often tweak and manipulate them in subtle ways that improve functionality, performance, and stability. Unfortunately, these changes are not always documented fully or clearly in a change control process, despite good intentions on the part of the organization. Virtually every IT group has a horror story about the recovery process of a failed server, where the server had undergone undocumented modification over the course of its useful life.
Bare-metal restore technology can be implemented to reduce the human intervention time required to restore a failed system from a raw drive to a fully restored production system.
The technology reduces the number of steps necessary to restore a server to a usable state and shortens the time needed for each step.
How does bare-metal restore work?—Essentially, bare-metal-restore software tracks key server data (e.g., file-system details, volume configuration, operating systems levels, and kernel information). It rebuilds systems based on scripted bare-metal-restore instructions that fully configure the server environment exactly as it was at the time of its last backup. Once this is accomplished, the payload of applications and data can be recovered from the traditional backup application. Some backup applications have bare-metal-restore modules; others can be augmented with third-party utilities.
When to use bare-metal restore technology—Bare-metal restore provides fast recovery of failed systems with less administrator effort, making it desirable in any enterprise with RTO pressure. The technology reduces the strain on system administration resources and can offer improved consistency for system restorations.
Hierarchical storage management (HSM)
HSM, life-cycle management, and data migration—HSM software manages the migration of data from primary to secondary storage subsystems based on defined thresholds of file-system metrics such as file type, date, size, and last access date. HSM gained popularity in the late 1980s and early 1990s due to the relatively high cost per megabyte of disk storage compared to removable storage media.
Today, the concept of life-cycle management is used to describe the process of managing data throughout its entire cycle of creation, modification, archive, and removal. The ideal architecture provides the optimal storage medium, based on price, performance, longevity, and portability, for each phase in the data's cycle and manages the movement of data across this continuum.
How do they work?—Traditional HSM technology is positioned primarily as a solution to reclaim wasted storage space and rejuvenate file systems that have become bogged down with excessive file content. HSM can also serve as an enhancement to backup and recovery by reducing the amount of data that needs to be backed up and consequently recovered. Second-generation HSM (life-cycle management) is currently emerging and is often integrated with e-mail systems and database management systems to address the management complexities of these rapidly growing data types.
When to use HSM—HSM/life-cycle management technologies can provide relief in backup environments that are overburdened with ongoing backup of static data. HSM technologies generally include their own utilities for backup of migrated data, but this operation is done only once and is less computer-resource-intensive than traditional file-based backup systems.
Paul Mayer is product manager of data-protection solutions at Datalink (www.datalink.com) in Chanhassen, MN.