Automated data migration software drives information life-cycle management.
By Glenn Rhodes
These days IT administrators have to be concerned about a variety of issues such as how to cut the IT budget while end-user data growth rates are 60% or more per year; how to manage all the reference information (or fixed-content data) that is accumulating within the organization; and how to comply with government regulations regarding retention of business documents to avoid fines that can reach into the millions of dollars.
According to a study by the Enterprise Storage Group (ESG) consulting firm, reference information, or unstructured data, is growing at a 92% CAGR while traditional information such as database and transactional data is only growing at 61%. Reference information includes but is not limited to electronic documents such as contracts, e-mail and e-mail attachments, CAD/CAM designs, digitized information such as medical images, or video and voice data. For instance, to maintain compliance with federal regulations, brokerage firms must save checkbooks, bank statements, canceled checks, and cash reconciliation, portfolio analysis, bills receivable, and copies of all communications and contracts and be able to run historical reports for clients to verify that portfolio value in prior years was accurate.
It is no surprise that enterprises today continue to experience an explosion in the growth of online data and the demand for higher storage capacity. In a period of budget-conscious decision-making and strict government regulations, IT—from storage administrators to the CIO—are under intense pressure to implement best practices when it comes to managing storage to reduce costs and tighten operational efficiency.
This article discusses trends in information life-cycle management and outlines potential end-user benefits of implementing what some analysts call "automated data migration."
Evolving regulatory requirements such as the Health Insurance Portability and Accountability Act (HIPAA), Life Sciences and Pharmaceutical regulations, and SEC regulations are not only changing the type of information being retained, but they are also forcing storage administrators to rethink the way they manage the life cycle of their information assets. The traditional storage media for reference information has been off line media such as tape or optical storage due to their historical lower cost per terabyte.
However, with the continuous reduction in the cost of disk arrays, IT administrators are storing more and more data online to speed access time and take advantage of new low-cost technologies such as ATA-based RAID. For example, an insurance company typically stores online or near-line up to three years' worth of client data (i.e., applications, correspondence, claim forms, etc.) to improve customer service levels. Then, after three years, this information is archived to tape.
A key question for IT managers is: How do they determine which data should reside on one storage class vs. another (i.e., highest performing, most highly available storage vs. low-cost storage)? In fact, one of the toughest challenges with information life-cycle management is profiling the relative value or criticality of data and storage resources to the business. Only then can administrators put the right data on the right storage medium at the right time. By properly placing data, IT can more effectively distribute data across multiple resources, which will lead to improved storage utilization and reduced storage acquisition costs. Automate this process and IT can benefit from improved productivity levels as well.
Automated data migration
Automated data migration (ADM) provides the ability to perform data valuation and placement to ensure resource usage is in line with business requirements. According to Nancy Marrone, senior analyst at ESG, "Automated data migration tools automate the migration of data from one storage class to another based on user-defined policies. ADM solutions are essentially a combination of intelligent SRM [storage resource management] at the file-level and HSM [hierarchical storage management] solutions. They enable users to value and profile data and storage resources and implement movement by matching data to the most appropriate storage resources based on the value of the data to the business."
In many respects, ADM combines the capabilities of traditional storage management functionality with relatively new technology. SRM components are used to discover and monitor resources and evaluate usage patterns, the age of data, and the ownership of data sets. In addition, HSM elements are used to automate data movement without impact to users or applications. However, unlike traditional approaches, ADM relies on the integration of SRM, HSM, and multi-threaded policy engines for more-intelligent and automated placement of data across multiple heterogeneous storage resources based on multiple criteria set by user-defined policies.
As a starting point, an ADM solution can be implemented by discovering and scanning each device and volume on the network, as well as its contents, and summarizing the results in reports. Although most SRM tools can monitor at a file, volume, or device level, they typically require installation of server agents. This is not always the case with ADM software, which offers non-intrusive, agent-free data collection. Administrators can quickly determine total capacity (available and used) by servers, directories, volumes, or owners. In addition, they can gather information about the changing value of data within the enterprise, such as space consumption by files and directories, file age distribution, data usage patterns, stale file analysis, etc.
The next step is to organize, classify, and prioritize all the information assets—across multiple storage volumes—into logical resource groups. The ADM software will first automatically group together common file types, such as office files, image files, or audio/video files. Administrators can also create custom file groups using attributes such as file type, extension, age, size, last modified/accessed, or by owners. Within storage groups, administrators can organize disparate volumes based on similar attributes such as storage type, cost per megabyte, capacity, and make/model. These global storage groups might logically bind together volumes that span multi-vendor storage systems.
The use of global resource groups not only simplifies overall administration; IT managers are also able to assign different levels of value and prioritization to their data and storage assets.
"HSM solutions treat data movement based on access, while ADM tools enable value to be assigned to data and storage resources based on weighting, age, type, size, cost, or other criteria," explains Marrone. For instance, an administrator at a law firm might create a file group that consists of all legal documents not accessed within the last 180 days. Since this data is associated with older cases that are no longer active, the IT manager would assign a "low-value" to this file group. Similarly, the IT manager could create a storage group, such as a high-cost SAN storage group or a low-cost NAS storage group.
A key aspect of ADM software is the ability to incorporate and deploy these global resource groups into central migration policies, even though the files and volumes are spread across many different storage devices on the network. In addition, ADM solutions are dynamic, since over time any new files created by users or new storage purchased by IT are automatically updated into the appropriate global resource groups to ease overall administration, and any existing migration policies are dynamically updated.
Once the appropriate file groups and storage groups are selected, the final step is to configure the policy migration engine based on the storage management objectives the administrator is trying to achieve. If the objective is to optimize cost, then the policy migration engine will make the best data placement decisions based on the different levels of cost within the storage network. As an illustration, the administrator would set a migration threshold (i.e., 70%), and if any volume within a high-cost storage group exceeds this threshold, the less-critical data migrates from high-cost storage to the low-cost storage group. If the objective is to optimize utilization across DAS/NAS/SAN environments, the policy engine migrates data from over-utilized volumes to under-utilized volumes until capacity utilization is optimized across the network.
The integration between policy migration and monitoring of usage patterns enables the automation engine to adaptively and intelligently make the best decisions about where to place data on the storage network. Furthermore, the data movement software is integrated into the entire life cycle of data whether it resides on online or offline, or on primary, secondary, or tertiary storage.
There are a number of potential benefits to consider when implementing ADM:
Backup optimization—IT departments continually face the problem of expanding backup windows, and as the total amount of data stored online increases, this problem only gets worse. ADM software migrates less frequently accessed and fixed data from primary storage to secondary storage, so that only critical and active data requires aggressive full-system backup policies.
Create online archive—With the increased government scrutiny of documents and their retention policies, there is an urgent need for financial services companies to preserve e-mail and other communications related to brokers and dealers to maintain compliance with federal regulations. ADM uses advanced policy automation to migrate lower value data from primary storage to low-cost NAS, rather than offline tape. Brokerage firms benefit from deferred tape procurement and improved customer service through quicker response times.
Optimize storage utilization—Even the best IT departments often struggle to achieve greater than 30% to 40% utilization across their entire network. ADM automates migration of data from over-utilized volumes into under-utilized volumes across DAS/NAS/SAN environments, which extends the life of primary storage and increases storage utilization.
Tiered online storage—New storage devices are dropping below $0.02 per megabyte, yet most IT departments can't take advantage of them because the process of selecting and moving data to these devices is cost-prohibitive. ADM software enables these companies to automatically and transparently migrate less-critical or inactive data to these lower-cost devices.
Glenn Rhodes is director of product marketing for Arkivio (www.arkivio.com) in Mountain View, CA.