Users should consider software-based technologies such as thin provisioning, storage virtualization, and tiered storage to make primary storage more energy efficient.
By Christine Taylor
—The green data center—or rather the lack thereof—is a serious issue. Bloated energy budgets are finally impacting IT after years of successfully ignoring them, overburdened urban power grids are threatening energy supplies, and data-center build-out is forcing companies to build new data centers—never number one on anyone's hit list. The Department of Energy estimates that during 2005, data centers in the US used about 45 billion kilowatt-hours, or about 1.2% of national electricity consumption. Today that number has grown significantly.
So there is a compelling business case for evolving energy-efficient (or "green") data centers. The real question is: How can businesses get there? How much of energy and space savings can a green technology actually produce? There is no one answer, as energy returns differ radically according to the product. For example, incremental changes such as energy-efficient switches or water-cooled racks may be small by themselves, but when compounded, they produce a good rate of energy savings. However, by far the most dramatic savings occur around software technologies that manage data for energy efficiency. The reason is that data storage is a prime offender when it comes to data-center power and space, so significant gains in this area translate to significant progress toward the green data center.
Some of these software approaches are already well-developed in the secondary storage arena. Technologies such as consolidation, virtualization, compression, de-duplication, and massive array of idle disks (MAID) offer energy and space savings on storage archives. But there is an important area that has largely lacked energy-efficient technologies: primary storage. And this translates into some very steep energy bills.
The greening of primary storage
Primary storage is a challenge because it must maintain performance service levels, and it does not have the luxury of 20x reduction ratios that can be achieved on secondary storage. The requirement to maintain high service levels makes it difficult to design primary storage for energy efficiency, which in secondary storage comes from data reduction or disk designs that are unsuitable for production environments. Making matters worse is the all-too-common practice of keeping even large volumes of inactive data on primary storage, because there is no good way to automatically migrate it to more energy-efficient lower storage tiers such as SATA drives. Yet primary storage, with its high-performance disk, produces higher and higher energy and cooling costs and results in data-center build-out as companies put in ever more racks and denser storage.
In this challenging storage arena, innovative software technologies must be able to intelligently manage storage capacity to control energy demands. This capability allows IT to fundamentally optimize usable data capacity on its primary storage systems while preserving data integrity and performance. Three software-based technologies that accomplish this for primary storage are thin provisioning, storage virtualization, and automated storage tiering.
Thin provisioning increases disk utilization, which cuts down on additional disk purchases and attendant energy demands. Thin provisioning works by allocating virtual rather than physical storage capacity to applications. Applications commonly reserve extra physical space to make sure there will be sufficient space to write data as the application grows. This is a good idea in theory, but the reality is that allocation renders large portions of disk unusable until the application needs them. Meanwhile, spinning primary disks draw power and produce heat, and IT must purchase additional disk capacity to store actual data elsewhere. Thin provisioning virtualizes allocation, making the application believe it has all its allotted sectors instead of a just-in-time physical allocation.
For example, with thin provisioning IT can make it appear to an Oracle application that it has 80GB provisioned to it. In reality, IT has run the thin provisioning application's trending feature to see that 8GB of physical allocation will be more than adequate for the next three months. At any time that the application needs more physical space, administrators can provision more capacity on-the-fly. This practice offers much more usable space on the disk, since unused sectors are not sitting idle as part of an allocation land grab.
Thin provisioning is not a universal panacea, as some applications will automatically mark all their allocated disk space with metadata to improve performance, making it unusable for other data. Still, thin provisioning can be highly effective for using disk capacity and cutting down on new disk purchases and associated power, cooling, and space costs. When choosing a thin-provisioning application for primary storage, users should look for products that do not negatively impact performance, provide trending and other analytics, and are not limited to a single LUN.
Virtualizing storage is a key strategy for the green data center. Applications often demand their own storage, once again leaving large chunks of production disk under-utilized. To fulfill demand, IT must purchase yet more disk, making energy needs rise and real estate shrink. Storage virtualization helps to solve this problem by presenting virtualized storage pools to multiple applications. This results in much better disk utilization, which in turn, results in fewer disk purchases and shrinking operational costs. These savings are not just from using less electricity, but also include lower cooling costs, space savings, and optimizing energy investments such as cooling racks.
As with any energy-efficient technology, primary storage has the added requirement of maintaining high availability and performance service levels. For this demanding environment, IT should purchase storage virtualization products that preserve performance, such as striping read/write operations across multiple disk spindles. Administration efficiency is also an issue, and virtual volumes should offer a straightforward creation process without the need to allocate drives to specific servers or time-consuming performance tuning. Primary storage cannot sacrifice service levels even in the name of energy efficiency, and high-performance storage virtualization means it will not have to.
Automated storage tiering
The strategy known as information lifecycle management (ILM) is made up of several tactical components, chief among them storage tiering. The idea is that IT matches storage specifications to changing data priorities, thus aligning storage resources to relative data value. Classic storage tiering migrates less-active data that should be kept online to cost-effective Tier-2 storage, usually SATA arrays. From there it can be deleted at the end of its retention cycle or can be migrated to Tier-3 storage, usually tape media. This plan has a number of economic advantages, including energy efficiency, as the majority of stored data—;80% is a common average—is persistent data that does not have to be stored on primary disk. By moving data off high-performance and energy-hungry primary disk, companies can realize monetary gains from storing data on energy-efficient SATA drives or even on Tier-3 tape. (SATA is energy-efficient, because it offers more capacity behind fewer storage controllers, which decreases energy demands.)
However, there is a large fly in the ointment: the inability to automate the storage tiering process. Manual storage tiering requires that IT identify data according to priority, match that priority to the appropriate storage resource, and then move the data accordingly. This level of manual effort results in corporations not doing storage tiering at all, or doing it on just their most critical data, or spending large amounts of money to get a professional services organization to do it. In terms of energy efficiency alone, the corporation either misses out on the energy savings of SATA disk or tape, or cancels out energy gains by spending money on manual tiering.
This is why automated storage tiering helps corporations to realize serious energy gains by storing less active data on energy-efficient media. This software-driven approach enables IT to automate the storage tiering process, setting policies to automatically control data retention and movement. By using the energy efficiencies of SATA disk or tape, administrators control the expensive energy requirements of primary storage.
By far the main offender in storage-related energy costs is the sheer amount of data that corporations store. More data means more arrays, more racks, more power and cooling, and more data-center space. That is why managing storage for energy efficiency must include shrinking and controlling storage within business guidelines. Compliant data deletion policies help, but a good deal of data must remain stored online for drivers like business value, disaster recovery, litigation, and compliance. Companies can efficiently store large amounts of data in lower storage tiers, since this environment has access to 20x reduction technologies and energy-efficient disk with technologies such as data de-duplication, compression, and MAID. However, most of these technologies leave primary storage wanting as energy efficiency must not affect availability or performance on primary systems. Although some vendors are making inroads into primary storage compression, for the most part primary storage best maintains its service levels by optimizing disk. Energy gains come from using fewer disks and therefore less energy and real estate.
Christine Taylor is a research analyst at the Taneja Group. For details on the report, "The Greening of the Data Center: A Four-Part Strategy to Achieve the Energy-Efficient Infrastructure," go to firstname.lastname@example.org.