Hierarchical Storage Management: The Basics

HSM Based system

The rapid expansion of businesses into e-commerce has created a new paradigm in data management. IT shops must now cope with a steady stream of mission-critical data that can swell a company’s storage requirements into the terabyte range. At the same time, IT managers have to provide guaranteed access to data on a 24×7 basis.

With no margin for error in managing data that is linked to company revenues, finding a way to effectively fulfill this mission requires a rethinking of traditional data management methods. Among the myriad tools available, the familiar technology of hierarchical storage management (HSM) can provide a means to regain control of data management.

The primary goals of managing data have not changed. Critical data must be available to users when and where it is needed; the data must be protected from loss; and data that must be kept for long periods must be archived. What has changed is the necessity of performing these tasks in a complex environment that has little tolerance for downtime and that grows more difficult to manage each day.

The old quick-fix answer to managing large amounts of data that need to be accessed on a 24×7 basis was to increase the storage capacity on a server by adding more disk drives, or adding a bigger server. This approach only multiplies the management problems, and increases manpower costs and total cost of ownership (TCO).

HSM to the rescue

HSM classifies data for dynamic migration/demigration according to a two-tier or three-tier architecture. In a three-tier configuration, level one is data that must be readily accessible 24×7. Level two is data that is accessed periodically, but does not need to be available on a 24×7 basis. Level three is archived data that is accessed infrequently, but is usually kept to comply with business rules regarding record retention. In hardware terms, level-one data is kept on the network server, level-two data is migrated to a storage device such as an optical library or jukebox, and level-three data is archived in a tape library or removed and stored in a vault.

HSM software manages the data on the server according to rule sets defined by the administrator. The flexibility of HSM allows these rule sets to conform to specific business requirements.

Files that are of a certain age, or of a certain type, are dynamically migrated from the server to the near-line storage device according to the defined rules. No administrative intervention is required. In addition, peak server loads can be balanced by setting a watermark that will migrate files when the volume size or server load exceeds an administrator-defined level.

Some HSM software offers file tracking that is embedded inside the operating system directory, and works by utilizing a directory pointer that points to a file’s location. When a file is migrated or demigrated, the pointer reflects the new location. The pointer system should be transparent to users, who will be able to see and access a file just as if it was still on the server.

When a user wants to access a migrated file, he simply clicks on the file. The pointer in the directory locates the file and it is automatically demigrated to the server’s hard drive. Even files that have been migrated to off-line storage have pointers in the directory. The process is the same. The user selects the file and the HSM software alerts the administrator as to which media to load for the file to be demigrated to the server.

The benefits of using HSM to dynamically migrate/demigrate data impact a business beyond TCO savings. Although HSM is a data-management solution, and not a method for data backup or recovery, minimizing the amount of data on servers positively impacts the labor and downtime involved in backup procedures.

A business’ strategy and needs will determine how much data should be migrated off servers to near-line or off-line storage. Generally, the percentage reduction of server space translates directly to a percentage reduction in data backup times. Administrators can choose not to migrate files until they have been backed up, eliminating the need to perform this task repeatedly on the same files.

Shorter backup time translates directly to less downtime. Downtime can result in lost revenue if the data needed for a transaction is not available because the server is off-line. The longer the downtime, the greater the loss in revenue.

One of the key advantages of HSM migration/demigration is that the entire process is seamless to end users. In an ideal environment, the migration/demigration of the data through the different levels is transparent, creating what is basically a virtual hard drive.