Regulatory compliance, rather than cost savings, may be the best justification for considering database archiving.
By Mike Drapeau
The Drapeau Group
Database archiving addresses the issue of how to deal with "aged" data. Several vendors, including IXOS, OuterBay, and Princeton Softech, have solutions in this space, but database archiving requires a new value proposition from previous file-based archiving approaches, and the total cost of ownership (TCO) argument merits evaluation.
The business case in favor of database archiving is based on the realization that production databases keep growing and that this growth increases costs for the following:
- More CPUs;
- More storage area network (SAN) ports, redo layouts, and disk performance upgrades;
- More storage hardware to accommodate the growth;
- Tape software, hardware, and media to accommodate backups;
- Growth in the size of the cloned or snapped images of the production database;
- Growth in the size of the mirrored images of the production database used for business continuity; and
- Supporting backup windows that are harder to hit due to a reduction in the allowable downtime and/or the amount of time needed to run the backup.
Since most IT shops have from tens to hundreds of production database instances, one would think that the TCO model for database archiving would cover them all, or at least most of them. Not so.
Most database archiving vendors recommend that their solutions be deployed first on the largest databases, specifically, those supporting enterprise resource planning (ERP) software. Why? To have archive software work, it must have a perfect understanding of the database schema, its relationship to the application, and all dependencies among tablespaces, indexes, and reference keys. This knowledge preserves referential integrity. The archiving vendors can best obtain this architectural information directly from the ERP providers. Consequently, each archive vendor "specializes" in at least one of the main ERP systems (i.e., Oracle Apps for OuterBay, SAP for IXOS, and PeopleSoft for Princeton Softech).
Understandably, some database archive vendors avoid implementing their software on proprietary databases or even mainstream commercial databases because this would require considerable customization of the archiving software. Supporting these routines over time would be an even bigger headache. The result is that an archiving TCO model is usually built on the payback from just one ERP system.
From one to many
Before you evaluate the validity of the TCO argument, it is important to note that a successful archiving implementation results in the creation of additional logical systems that are usually deployed to "cheaper" storage:
- An active archive instance to hold archived records;
- A query instance to enable users to submit queries that run across production and archived data; and
- A flat file repository to hold aged records that can be removed from the active archive.
The TCO model for database archiving often assumes that all database instances are on Tier 1 storage and that each production instance generates at least six full copies. These arguments may be flawed on three counts:
- They risk "double-counting" savings anticipated from similar TCO models for tiered storage. For example, many analysts advocate using Tier 1 storage platforms to support production and QA instances, and using Tier 2 or 3 storage platforms to support other instances (development, staging, training, etc.).
- The TCO models assume that all copies are full, whereas many are not; development and training copies, for example, are typically 10% to 15% of the size of the production instance. Archiving vendors have built-in "subsetter" routines that shrink the size of the production instance when making a copy. Other (non-archiving) software vendors offer this functionality as well, and some organizations have already developed their own subsetter routines because oftentimes production data must be "sanitized" to remove customer or proprietary data.
- These TCO models assume many copies, each of which is physically and logically distinct. However, many shops use disk-based cloning or snapshot software to re-provision full copies for multiple purposes.
Like any new system, database archiving requires a series of new processes, skills, and systems, which in turn require time, talent, and money—three things in short supply in most data centers.
New processes—Even a simple database archiving installation requires developing, tailoring, and implementing a series of new operational processes that manage all the archiving routines, maintain and configure the archiving software, retrieve individual records from the flat file non-active archive to meet user needs, and retrieve large chunks of records from this archive to determine if they can be restored when the systems environment changes.
New hosts—Whereas before there was one server and one production instance, now there may be three instances and a new flat file. Each may or may not have its own dedicated server, depending on the needs and capacities of the organization. In any case, these environments need to be logically separate but identical to the production instance. Some of the "saved" cost from the decrease in size of the production instance gets spent on these new systems and the various processes supporting them (e.g., backup, anti-virus, etc.).
Trapped storage savings—Since these new systems consume storage, although at a reduced cost per megabyte, they require additional capacity, which may or may not be available. Also, the "freed-up" space on the Tier 1 storage platform may not be "saved" at all. It is simply empty, but its cost is still a burden to the enterprise. This implies that an archiving implementation needs to occur at the same time as a storage refresh/consolidation so that the provisioned Tier 1 space is actually recovered. All of this adds cost to the bottom line.
New skills—Once the new servers and archived instances are installed, organizations need to acquire specific expertise in the archived software and the various "hooks" it has into the production instance and application, and continue to maintain the custom code developed to accommodate any home-grown implementations. Training costs money.
Business continuity vulnerability—Organizations with robust disaster-recovery configurations that preclude transaction loss should understand that introducing a multi-instance archiving solution represents at least an added complexity to their recovery scenario and may introduce a new area of vulnerability. The potential for trouble comes when the archive routines are busy moving records from the production environment to the active archive environment. This may happen weekly or monthly but when it does there is a risk to referential integrity should disaster strike while records are being removed. If a disaster strikes, an organization needs to be able to recover both the production and active archive databases and know with certainty that a state change made in one has been reflected in the other. Each archiving vendor offers a different approach to address this risk exposure; some use triggers and others use scripts to "sniff" the redo logs for missing transactions.
It's clear that the TCO arguments and business drivers supporting an archiving solution are worth close scrutiny. A January 2003 META Group report entitled Archiving: Databases on a Diet notes that cost savings is not the most likely driver of database archiving and that the solution "is not a trivial process."
However, regulatory compliance is a new driver that may turn out to be the strongest argument in favor of database archiving.
Much has been written about the impact of recent federal regulations (HIPAA, Sarbanes-Oxley section 802, SEC rule 240.17a-4, NASD rule 3010, and DOD Directive 5015.2) that mandate digital data retention. All point to one certainty: Organizations will have to keep data around longer and retain it in a write-once read-many (WORM) format. These regulations may ensure the success of database archiving. By linking database archiving software with WORM storage devices, organizations can kill two birds with one stone—removing inactive records and sending these records to a storage device that provides regulatory compliance that will satisfy auditors.
In the end, it may not matter that the business case for database archiving is difficult to make. The government may drive you there anyway.
Mike Drapeau is president of The Drapeau Group (www.drapeaugroup.com), an Atlanta-based consulting firm specializing in strategic development, platform architecture review, and issues such as regulatory compliance assessment.