By Mark Teter
Database recovery is something that should not be tried for the first time in a disaster situation. In fact, the best recovery plan is maintaining your environment so that you never have to perform recovery. The key to this strategy is using a comprehensive availability approach. By implementing a highly available database infrastructure, the likelihood of ever having to perform recovery is greatly reduced. However, in the event of performing database recovery, IT departments are relying on point-in-time (PIT) imaging. PIT technology generates online images of production databases that are used for near-instant recovery.
Database-recovery strategies involve understanding the physical data structures relevant for database operations. Oracle and SQL Server have similar internal database components and are similar in how data recovery is performed. (For a review of Oracle and SQL Server backup and recovery, see sidebar on p. 34). It is just as important to provide a database environment that protects against downtime as it is to provide an environment that provides quick data recovery.
PIT technology comprises solutions that provide either full volume copies or virtual copies. As summarized in the table on p. 36, most vendors provide both types of images. PIT images that comprise full-volume copies come in two varieties. The most popular is known as a third-mirror copy—a complete physical copy of data generated using a mirroring technique. Usually this is done in a RAID-1 configuration, hence the term "third-mirror copy." Data is constantly being copied to the third mirror before the point-in-time, or mirror break-off, occurs.
Third-mirror copies continually receive all updates generated to the primary database, which can cause three I/Os for every one write if the mirroring is performed by the database host. Letting the storage subsystem perform the mirroring alleviates this I/O overhead, with the only burden being additional disk capacity required for the third mirror. Some vendors such as Hitachi provide third-mirroring under RAID-5 volume layouts, which minimizes the additional storage requirement. Once the primary volumes and third-mirror copies are synchronized, they can be split or broken off. Subsequent updates to the primary volumes are not reflected to the PIT image.
PIT clones are similar to third-mirror copies in that there are full-volume copies. However, clones do not use a mirroring solution, but establish the PIT image using a copy-on-demand technique. This technique duplicates the primary volume to the PIT clone, ignoring any updates to the primary volume during the copy. Clones provide instantaneous access to the secondary copy before the actual copying process has completed. If the PIT clone receives a request for a data block that has not yet been copied, the clone will instantly copy that block of data. Once the copying process is complete, an independent PIT image of the primary volume exists.
The alternate method of creating PIT images is virtual copies. Whereas third-mirror copies and clones create exact physical copies of a primary volume, virtual copies create a pseudo volume that comprises pointers to data blocks that exist on the primary volume. Instead of copying the actual data, virtual copies only keep before-image copies of changed data blocks by performing a copy-on-write (COW). The COW technique preserves the PIT image by copying the original contents of the primary volume to the PIT image whenever changes are made to the primary volume. Since virtual copies consist mainly of pointers to data blocks, storage overhead is minimal. Virtual copies require about 2% to 6% of the size of the database depending on the update activity, making it reasonable to have multiple PIT copies exist for a given database volume.
Virtual copies put additional I/O overhead on a primary volume due to the COW, and from the PIT image accessing any unchanged data blocks on the primary. The additional overhead generated from the COW is only incurred on the first update to a data block on the primary volume. Once the block has been copied to the PIT image, subsequent writes to the same block on the primary do not generate the COW since access to the block is now made from the virtual copy. It has been reported the overhead is between 15% and 20% for Oracle running an OLTP workload containing virtual PIT images. Overhead is lower when using direct I/O and can also be mitigated by hardware RAID with large cache configurations.
By using point-in-time imaging technology, there are several ways to provide data-recovery solutions for Oracle and SQL Server environments.
Block-level incremental (BLI) backups are a derivative of virtual copies. Through an application programming interface (API), Oracle provides BLI for third-party backup solutions, such as BakBone NetVault, CommVault Galaxy, Computer Associates BrightStor, Legato NetWorker, and Veritas NetBackup. BLI backups are particularly useful for databases that are hundreds of gigabytes or terabytes in size. Since BLI backups only copy those blocks that have changed since the last backup, fewer tapes and system resources are required. Veritas provides BLI backups in the form of a NetBackup storage checkpoint. These checkpoints are like virtual copies in that they only keep track of the before-image contents of changed blocks. As a result of this space-saving copy, multiple checkpoints can easily exist on a given file system. Any storage checkpoint generated after the one that contains the before-image copy will use the checkpoint's data and not the primary volume data, hence requiring no additional COW.
The most common method of creating database PIT images is to quiesce the database by putting it in "backup mode." Once in backup mode, the database will reach a point of consistency, having all in-flight transactions that are in buffer cache flushed to disk. Backup mode will make data files read-only, causing all transactions to be written to the transaction logs. This is the safest way to generate consistent database PIT images. PIT images can also be created using a brute-force method by just breaking off the PIT image and letting the transaction logs recover/rollback any in-flight transactions during the database start-up sequence. The best method to create database PIT images, however, is to put the database in backup mode for Oracle (ALTER TABLESPACE BEGIN BACKUP) or use the freeze/thaw functions in SQL Server.
Once PIT images have been created, database recovery can be accomplished in a matter of seconds by replacing the primary database volumes with the PIT image. To protect against media errors, PIT images need to be backed up to offline media. As illustrated in the figure, backup servers can mount the PIT copies to perform zero-impact backups for Oracle and SQL Server. Database backups are generally more efficient using on-disk PIT images rather than writing directly to tape from an active database. By using this technique, PIT images can be left intact, mounted on the backup server ready for recovery purposes. If it is necessary to perform recovery, it is possible to manually re-assign, or resynchronize, the PIT image back to the database server. PIT images can have backups made of an entire database or of individual database files.
PIT copies can also be used with database replication. The PIT copy can either be created at the primary or secondary site. In fact, PIT imaging must be used with replicated volumes if they are needed for backup purposes. An active replicated volume is unmountable (even for read-only) and must be broken off to be usable for off-host processing. The main advantage of using PIT images with replication is the provision for automatic off-site media archiving. It is recommended that organizations take backup tapes off-site nightly, and using a replication solution for PIT copies facilitates this activity.
One of the benefits of PIT images is they can easily be re-created if they need to be brought up-to-date with the database volume. Alternatively, PIT copies can be resynchronized back to the database volume. Resynchronization can occur as a disk-to-disk copy (full resynchronization) or as a quick resync. Quick resynchronization can either be accomplished using journals or bitmaps. A journal resynchronization applies all updates that occurred while the PIT image was broken off as a running log, whereas a bitmap resynchronization only resynchronizes those data blocks that have been modified. A journal resync has the advantage of being interrupted without affecting database consistency but bears the burden of replaying all transactions back to the primary. A bitmap resync provides better synchronization performance because it only copies the last update. However, it requires the resynchronization process to run to completion since the order of copying may not coincide with the original write sequence. If a PIT image has been split from the primary for a significant amount of time, performing full resynchronization may be just as efficient as a quick resync.
PIT images provide the ability to perform cold database backups with the benefit of not having to take the database offline. It is recommended to perform cold Oracle and SQL Server backups weekly. PIT images are also useful for doing logical backups where the database schema along with all the data files and data within each data file are dumped to tape. Logical backups are necessary in certain recovery situations where the internal database structure and indexes need to be rebuilt.
Other benefits of using PIT copies include freeing the database server from operational restrictions during the backup process. During backups, SQL Server does not allow any changes to be made to the database size or its indexes or allow any non-logged operations such as a bulk copy (bcp) to load data. SQL Server will not let you start a backup while any of these operations are in progress.
Using PIT images is an effective way to provide maximum data recovery and protection for Oracle and SQL Server. In conjunction with SAN technology, organizations can use PIT images to increase overall database availability. PIT imaging technology has many benefits that every organization should leverage.
Mark Teter is director of enterprise storage solutions at Advanced Systems Group (www.virtual.com), an enterprise computing and storage consulting firm in Denver.
Database backup and recovery: A review
Data protection for databases is accomplished with either physical or logical backups. Physical backups are copies of database tables and data files, whereas logical backups are copies of database data. Logical backups use SQL commands such as export and import to extract the data along with the database schema. The benefit of logical backups is being able to defragment the database. The disadvantage is a longer recovery time since the data must be reloaded back into the database, which subsequently requires all indexes to be rebuilt.
Physical backups are most common and can be done online (hot backup) or offline (cold backup). Backups made with the database online are known as inconsistent backups. An inconsistent backup is a backup of a database that is open and is being modified while the backup is progressing. Hot backups rely on archive logs (Oracle) or transaction logs (SQL Server) to perform database recovery.
When a change is made to a database, the change is first written to a transaction log before the transaction is committed to the actual database or data file. Oracle generates redo logs and SQL Server uses transaction logs to record all changes made to the database. Data is written to the database only when a checkpoint occurs. A checkpoint is a record indicating the point in the log files where all changes have been saved to the actual database. When SQL Server begins a hot backup, it notes the Log Sequence Number (LSN) of the oldest active transaction and performs a "checkpoint," which synchronizes the pages on disk with the pages in the cache memory.
If a hot backup is performed, it is important to back up the log files after the data files have been copied to tape. If this is not done, and a media failure occurs that causes the online log files to be lost, the backup will not be recoverable. As a result, it is not uncommon to perform log-file backups as often as every 15 minutes, depending on database transaction activity.
Cold, or consistent, backups are backups that are consistent to the same point-in-time. Oracle uses System Change Number (SCN) in control files, data-file headers, and redo logs. The SCN is an ever-increasing value that acts as a time stamp for purposes of recovery. The only way to perform a consistent backup is to back up the database while all database files all have the same SCN. The only way to do that is to perform a cold backup, or back up the database while it is offline. If the backup was consistent during backup, then it is not necessary to apply any redo or transaction logs.
Oracle and SQL Server modify data blocks during normal operation and store them in buffer cache in memory. (SQL Server doesn't actually use block I/O, but pathname/byte offset.) Background processes are in charge of writing these buffers to disk. A gap always exists between the time a block is in buffer cache and the time it is written to disk. The more data that accumulates in memory without being written to disk, the longer the recovery time, since a crash or media failure will force the database to apply the log files to recover all the changes.
Oracle maintains one or more rollback segments. Whenever a transaction modifies a data block, a rollback log records an image of the data before the change is actually made. Oracle uses this information during database recovery to undo any uncommitted changes. If a batch job were aborted in the middle of execution, all transactions that were made prior to being written to disk would be rolled back to their original state. SQL Server maintains this information in the transaction logs. These log files are eventually overwritten, truncated, or archived. Archived log files are redo logs that Oracle has copied to another destination and rendered inactive. SQL Server does not automatically archive its transaction logs.
Oracle and SQL Server must also manage metadata about individual database files. Oracle stores this metadata in the control file and SQL Server saves it in the masterdb file. These files contain all the filenames that constitute a database instance. Oracle's control file contains consistency information that is used during recovery, such as the database name, names of all the data files, online and archive redo logs, and checkpoints. Every time Oracle opens up a database, its control file is used to identify the data files and online redo logs that must also be opened. As a result, the control file should be backed up whenever there are structural changes made to the database, such as adding, dropping, or altering data files, table spaces, or online redo logs.
Although backup operations can be complicated and vary from one database application to another, the basic principles are the same:
- Multiplex the online redo or transaction logs for performance and availability reasons;
- In Oracle, run the database in ARCHIVELOG mode and save redo logs to multiple locations. For SQL Server, copy transaction logs to tape or to separate online file systems after checkpoints;
- Maintain multiple copies of the control file (Oracle) or masterdb (SQL Server) and all other database configuration files (init.ora, tnsnames.ora, listener.ora, and msdb);
- Whole database backups should be performed at least once a week. Two whole database backups along with all the associated log files should remain available on tape;
- Consider using full backups when incremental backups are causing more than 30% of the data files to be backed up; and
- Perform backups after any structural changes have been made, including after log files have been reset, after heavy updating, or after unlogged operations have occurred.
A typical backup strategy involves conducting full backups on weekends, differential backups every night, and transaction log backups every couple of hours during the day. If there were a complete database failure on Wednesday at 8:00 AM, the recovery process would include restoring the full backup, restoring the differential backup from the previous night, and finally, applying Wednesday's transaction logs.
Microsoft SQL Server
Three recovery models for SQL Server include simple, full, or bulk-logged. With the simple-recovery model, only full backups are allowed. There is no way to back up only the changes made since the last backup. Consequently, the database cannot be restored to the point of failure or to a specific point-in-time. The benefit of this model is simplicity since database administrators do not have to manage transaction logs. The simple-recovery option truncates the transaction logs after any checkpoint, consequently only allowing the database to be restored to the latest full or differential backup. All non-logged operations such as BULK_INSERT, SELECT INTO, CREATE INDEX are permitted.
The full-recovery model uses database backups along with the transaction logs to provide complete protection against media failure. If a user accidentally deletes all accounts in a database at 8:00 AM, it would be possible to restore the database up to 7:59 AM, right before the deletion occurred. To guarantee this degree of recoverability, all operations, including bulk operations such as BULK_INSERT, SELECT INTO, and CREATE INDEX, are fully logged. Since these operations are all logged, the transaction logs will require more disk space. If it is crucial to capture every change made to the database, the full-recovery model is the most appropriate.
The bulk-logged recovery model provides protection against media failures combined with the best database performance and minimal log space requirement. In a bulk-logged recovery model, the data loss exposure for bulk-copy operations is greater than with the full-recovery model. Bulk-copy operations are useful when loading large amounts of data into the database and are more efficient for the database since it operates in non-logged mode. Instead of logging each insert into the table, SQL Server only logs the minimum necessary. If a bulk-copy operation occurs, point-in-time recovery is not possible.
Oracle has three basic types of recovery: instance, crash, and media. Oracle performs the first two types of recovery automatically at database start-up. Only media recovery requires intervention. Data recovery for Oracle involves applying the contents of the online redo log files to all the table-space files (rolling forward) and then removes any uncommitted transactions that are found in the rollback segments (rolling back). If Oracle cannot obtain the information it requires from the online redo logs, it consults the archived redo logs. If an SQL statement added a row to the database, but it was never committed, Oracle will discover this fact in the rollback segment and undo the change.
Oracle uses the Recovery Manager (RMAN) to automate the backup and recovery of data files, archived logs, and control files. In fact, only RMAN can back up the control file while the database is online; otherwise, the control files can only be backed up when the database is offline. RMAN, however, does not back up any configuration files or online redo logs. The backup server or media manager must back up these files. Online redo logs should never be backed up unless they have been archived to another destination. The danger of backing up online redo logs is that they may accidentally be restored over active redo logs.
RMAN does not manage writing or reading tape devices. To manage this process, RMAN is integrated into third-party media managers by using the SBT (System Backup to Tape) interface. This API frees Oracle from the bulk of I/O created from a database backup and offloads it to media managers. This makes it possible for Oracle to not be directly involved in the reading and writing process. The advantage of using SBT with third-party media managers is reducing the impact of the backup workload on the database server. This capability is known as "proxy copy" because Oracle selects the media manager as a proxy to actually perform the copy of data files to tape.
In order for RMAN to automatically recover a database, it must keep track of what has been backed up and written to media (disk or tape). This backup metadata can be kept in the control file or in a separate database instance called a recovery catalog. It is highly recommended that a recovery catalog be used since control files can grow quite large without one increasing the risk of corruption.
If the control files become corrupted or lost, it would be impossible to use RMAN to recover the database. A recovery catalog must be used if table-space point-in-time recoveries are desired.