By Dave Simpson
Are you running out of database disk space, or paying for expensive disk capacity that you don't really need? Is "database bloat" bringing your applications to unsatisfactory performance levels? If so, you may be a candidate for an emerging technology that some analysts refer to as "database archiving."
Database archiving is a standard practice in mainframe environments and is closely related to hierarchical storage management (HSM). What's new is that a number of vendors are introducing application-specific implementations for non-mainframe, distributed environments that provide more functionality than traditional HSM software.
Players in this space include relatively unknown vendors such as Bitbybit, FileTek, OuterBay Technologies, Princeton Softech, and Xoriant, as well as Legato (via its acquisition of OTG Software). There are different methods of database archiving, and vendors also differ in terms of which specific databases they support.
According to Carolyn DiCenzo, a senior analyst with the Gartner Dataquest consulting firm, in Stamford, CT, the primary driving force behind end-user adoption of database archiving is to reduce total cost of ownership (TCO). The primary benefits for end users are a reduction in primary disk space and database performance improvements as expressed in faster response times (see figure on p. 8). Depending on the degree of database size reduction, archiving can also speed backup-and-recovery times.
For some IT organizations, performance improvements alone may be sufficient to justify an investment in database-archiving software. For example, MidAmerican Energy, in Des Moines, IA, chose database-archiving software from Princeton Softech solely for the potential performance advantages.
"Our customer database [CustomerOne] was growing at 20% per year," explains Dennis Blackwell, supervisor of database administration at MidAmerican, "and we wanted to avoid performance degradation and maintain customer service levels, so we needed some way to automatically remove some of the older data from the database."
MidAmerican runs Princeton Softech's Archive for DB2 in an environment with a 9672 R75 mainframe, EMC Symmetrix disk arrays, and IBM's DB2 database, all connected to Windows NT/2000 clients. Princeton Softech's software, which the company generally refers to as "active archiving," also runs on non-mainframe platforms (Windows and Solaris) and databases such as Oracle, Informix, Sybase, and SQL Server.
"The software allows you to specify different criteria for the removal of data [to secondary storage], including data criteria, field value specifications, or any SQL statement," Blackwell explains. "That's the primary reason we brought in an archiving solution, not to save space." The advantage, he says, is that "day-to-day processes don't have to cycle through [the migrated] data, but the data is still available."
With most database-archiving software, users can have data automatically moved to disk, tape, or optical devices. MidAmerican migrates older data to Symmetrix arrays.
Although MidAmerican first installed the Princeton Softech software about a year ago, Blackwell says it's still too early to quantify performance improvements. "We're still in the infancy stage," he says. "One of the toughest things with archiving is getting customers to define what data they're willing to remove [from the primary database storage space], as well as defining the parameters. Right now, we're only archiving data that's more than four years old."
Although closely related in terms of functionality, database archiving differs from traditional HSM software. Gartner's DiCenzo defines HSM as "the migration of inactive files from primary storage [usually disk] to less expensive media, and the automatic recall of files to primary storage upon application or user access." She says that traditional HSM packages are often not a good solution for managing relational database growth, in part, because they usually migrate entire files or datasets.
Database-archiving tools go beyond HSM in a number of ways, including
- maintaining referential integrity (e.g., a set of related data from multiple tables is purged/archived and subsequently restored together as a single set) and restoring data in its business context;
- Providing a tight coupling with specific applications (usually, databases such as DB2 or Oracle); and
- Providing application-specific viewing capabilities.
Gartner Dataquest expects the database-archiving market to grow at a CAGR of 64% over the next few years.