Reference information to grow at 92% per year
By Heidi Biggar
One of the biggest problems facing end users today is what to do with all the reference information (or fixed-content data) that is beginning to accumulate within their organizations.
According to a "SnapShot Study" conducted by the Enterprise Storage Group (ESG) last year, reference information has been growing at a significantly faster clip than traditional non-reference, or flat, data types (92% CAGR versus 61% CAGR) over the past few years and will continue to do so for the next several years.
ESG interviewed companies of varying size from nine different industries: computer hardware, computer software, Internet, government, healthcare, media/entertainment, print and publishing, and financial services.
For the purposes of the study, ESG defined "reference information" loosely as "any digital asset retained for active reference and value." However, the firm said that reference information can be further distinguished from non-reference data by its file size (generally much larger and faster growing); access patterns (quick, frequent, and collaborative); authenticity, integrity, and security; and retention periods, which can span decades or longer.
The prospect of petabytes of reference information at hand creates an interesting dilemma for organizations already grappling with capacity and budget issues (see figure).
How should the reference information be stored (e.g., on disk, tape, or optical), how should it be managed (what policies should be put in place), how long should the data be retained (years, decades, or indefinitely), and how can it be best leveraged (and integrated with other applications) throughout the organization?
The answers to these questions depend largely on the overall potential business value of the reference information. Is the information a corporate asset—that is, can it be used to generate revenue for the organization, to lower total cost of ownership, or to help the organization better comply with government regulations for data retention?
And if so, does it make sense to migrate that type of information from traditional disk or tape resources to storage products (e.g., EMC Centera, StorageTek BladeStor, and Persist Technologies AppStor) that have been engineered from the ground up to meet the specific requirements of reference information?
ESG argues that if companies choose storage technologies based on specific data requirements and on the value of the data to the organization—and then create a storage infrastructure that leverages data across information types—they will be in a much better position competitively than those that do not.
"The coming wave of reference information will challenge even the most savvy and experienced storage administrators, IT professionals, and CIOs to manage growth, minimize infrastructure costs, and capitalize on value, or return on investment [ROI] opportunities," explains Peter Gerr, ESG research analyst.
"Those companies that move aggressively in this direction," contends Gerr, "create opportunities for themselves, as well as their customers, suppliers, and partners, to continually leverage information into a sustainable competitive advantage."
For the time being, the majority of reference information that has been digitized resides on tape. Survey respondents said that 64% of their reference information currently resides on tape, while 35% resides on some form of disk (see figure above). ESG believes that these survey findings reflect general trends in the storage market, and that the migration of reference information from tape to disk will grow at a 148% CAGR over the period 2001 to 2006.
Why such a heavy dependence on tape? Because tape is still considered to be the most cost-effective and stable storage option for this type of data, according to Gerr. "The majority of corporate users do not yet consider reference information to be a mission-critical information type, but that's changing."
Gerr says that the nature of the reference information being stored and the access patterns will necessitate a switch to a technology that is more flexible, higher performing, and more intelligent.
"While tape's lower acquisition and upgrade prices are attractive, especially when you're storing large amounts of infrequently accessed or archived information, quick and timely access to reference assets is critical to unlocking the value they contain and enabling those assets to be repurposed, distributed, and shared," explains Gerr.
In terms of device attributes, study respondents said "indexing of reference assets" was the most critical feature of any reference information storage device. Total cost of ownership, long-term data retention capabilities, performance, scalability, application integration with various enterprise business applications (EBAs), and ease of management were also listed as key features (in descending importance).
ATA disk drive technology—in particular, Serial ATA—is expected to be a key driver for reference markets as well as other storage segments (e.g., disk-based backup and recovery) and has quickly become the disk-of-choice among vendors who manufacture fixed-content data repositories because of its low cost and competitive performance characteristics.
Says Gerr: "The emergence of technologies such as ATA/IDE-based disk drives, which are positioned for the unique requirements of reference assets, will act as a catalyst to the market's overall growth."
While examples of "reference information" abound, the most common applications today include e-mail and attachments, digital audio and video files, check and document imaging, genomics and drug discovery, medical imaging and patient records, and CAD/CAM designs (see figure below).
ESG expects the growth of reference information to be greatest in industries in which the value of being able to "store, manage, and share" reference information is already seen as a competitive advantage (i.e., a proven ROI) or where federal or state regulations set specific data retention policies.
In particular, analysts believe the Health Insurance & Portability Accounting Act (HIPAA) privacy regulation, which goes into effect next month, SEC rule 17a-4(t), and the Sarbanes-Oxley Act will propel this market forward, setting new standards for the transmission and retention of healthcare and financial data, respectively.
Specifically, HIPAA establishes new guidelines for the transmission and retention of all health data and medical records. The regulation requires all healthcare and related companies to keep records (e.g., policies, patient requests, and complaints) for a minimum of six years. A second, but not yet approved, rule concerning the security of this data is expected to specify electronic record retention periods.
Although neither HIPAA regulation dictates the type of storage medium (e.g., disk, tape, or optical) that an organization must use or where the data must be stored (on-site, off-site, etc.), much of this data is expected to eventually end up on storage devices designed to address reference information requirements.
Similarly, SEC Rule 17a-4(t) dictates digital archiving requirements as they relate to storage (specifically, what type of format must be used, how long the data must be retained, and where duplicate copies of the data must be stored and for how long), and the Sarbanes-Oxley Act dictates how corporations must store financial data, as well as any data used to compute their financial status.
The SEC rule and the Sarbanes-Oxley Act require data to be stored on an "unalterable" storage format (e.g., on WORM optical, WORM tape, or some reference information disk systems, like EMC's Centera and Persist's AppStor).
Among the early adopters of reference information storage devices are financial services, healthcare and pharmaceutical, print and publishing, and media and entertainment organizations, as well as various government agencies. Organizations within these vertical markets have been among the first to implement devices designed specifically to meet the storage requirements of reference information.
As for particular applications that will drive market growth, digital media and e-mail are among the initial strongest targets. When asked which applications they thought would drive reference information growth in the 2004-2005 time frame, 47% said digital/media asset utilization would have a strong impact, while 20% said both external and internal collaboration would have a strong effect (see figure above). These findings are in line with product positioning statements from the various early market players.
See next month's issue for a look at the applications and platforms that are available for storing reference data and at the partnerships that are being formed between storage and EBA vendors to ensure that reference information is stored and managed in the best—and most efficient—way possible.