To cut power and energy costs, IT should consider data de-duplication, thin provisioning, primary data compression, and virtualization.
By Christine Taylor
—The constant "green" clarion call is all around us, with vendors from all areas of IT jumping on the bandwagon and marketing themselves and their products as eco-friendly. Green packaging, green facilities, green disk drives, green arrays, green switches, etc.
Of course, energy usage is a real problem because data-center energy costs are skyrocketing. This has spurred both challenges and advances across the IT continuum of servers, storage, networks, and facilities. Energy budgets are swelling as corporations use more and more energy to keep the lights on and the disks spinning. This is a large-scale problem, and the goal of the green data-center movement must be to control energy output on the same large scale.
There is no one fix for the problem. Instead, there are ongoing integrated measures designed to control energy consumption. First, there are a variety software technologies to optimize capacity and compress stored data. In addition to software-based solutions, a data center fully managed for sustainable low-energy usage might also include managing energy usage as SLAs, designing physical plants for highly efficient power and cooling, and improving hardware for maximum energy efficiency.
The centerpiece of the green action plan is the concept that less data requires less energy. It's a deceptively simple statement, since shrinking data without sacrificing value can be tricky. However, there are innovative software technologies that will give you an excellent start. What constitutes an excellent start? Even without integrating energy SLAs, green facilities design, and hardware advances, shrinking stored data can equate to savings of 50% or more without any loss of value. Add in the other three solutions and you can achieve up to 80% energy savings.
Growing storage system capacity is the time-honored method for housing more data. However, energy and cooling costs grow right along with capacity, making for unacceptably high energy costs and data centers that have reached the limit of their expansion. Instead of merely adding storage capacity, IT must instead optimize usable capacity on its existing storage systems. This alone will provide the data reduction needed to control storage-related energy issues—a significant offender when it comes to data-center energy costs.
Software-based technologies that can reduce energy and power consumption include data de-duplication, thin provisioning, primary data compression, and virtualization. There are other energy-saving technologies on both the software and hardware sides (i.e., tiering, consolidation, and sleepy drives), which we'll address in a future article.
Data de-duplication is a key technology for optimizing storage capacity, particularly for file-based data on NAS appliances and virtual tape libraries (VTLs). It is a green technology because highly optimized storage systems house less data, which pulls less power and generates less heat.
De-duplication can result in massive capacity savings and attendant energy savings. However, vendors approach de-duplication differently. For example, some work only at the file level while others work at the sub-file level for even greater data reduction.
One primary distinction revolves around the optimum location for the de-duplication process, which can be at the server level where the backup stream begins, inline to intercept the backup stream between server and storage, or post-processing at the storage level.
Representative vendors include EMC and Symantec at the source backup stream level—hardly surprising since de-duping adds value to their backup applications. Inline vendors include Data Domain and Diligent, and post-processing vendors include Sepaton and FalconStor. And Quantum has a hybrid appliance that provides both inline and post-processing de-duplication.
Whatever de-duplication method you choose, the space savings can be startling: De-duped backups can produce a 25x reduction in backup data, although the de-duplication ratio depends on a number of factors, and your mileage may vary.
Thin provisioning is another method of achieving energy savings on primary block storage, because it allows IT to allocate more storage space to an application than is physically available. As non-intuitive as this may sound, it is an excellent approach to the all-too-common problem of over-provisioning storage.
The reality of the data center is that users and applications often request much more storage than the application will actually use in the short- to mid-term. The practice is not unreasonable given the corporate budgetary process ("use it or lose it") and the need to constantly provide sufficient storage space for critical applications. However, the usual result of this space grab is rampant over-provisioning, with utilization rates of 20% or less. Meanwhile 80% of usable capacity is simply sucking up power and cooling, as well as data-center real estate.
This is where thin provisioning comes in. Thin provisioning allows IT to allocate a larger virtual amount of capacity than the actual physical amount available. For example, administrators might virtually provision 100GB to an application while in fact physically provisioning only 10GB in the physical storage pool. As capacity requirements grow, thin provisioning can automatically release physical storage chunks. IT can set capacity alerts so that if capacity limits are reached, then additional capacity is automatically provisioned. By decreasing over-provisioning and increasing disk utilization, IT avoids significant energy costs. It is not altogether that simple, of course. For example, some applications will grab storage anyway by automatically marking all allocated disk space with metadata to improve performance. Still, thin provisioning is very cost-effective for reducing over-allocation and accompanying energy costs.
In addition to data de-duplication and thin provisioning, primary data compression and storage virtualization can also reduce energy and cooling requirements. Neither technology is new, but both are maturing to the point of making significant inroads toward cutting down data-center costs.
Data compression in non-production environments has been around a long time. Compressed data saves disk space, speeds up data recovery using fewer backup sources, and saves on storage system and media purchases. And in the case of nearline (secondary) storage, compression uses less disk and thus cuts down on ongoing energy costs from constantly spinning disks. However, few companies have applied data compression to primary production environments, citing high-performance requirements that did not allow for compression/de-compression cycles. But as interest grew in the benefits of the energy-efficient data center, vendors turned their attention to the high energy costs of primary storage. Most de-duplication and data compression happens at the secondary storage level, meaning that uncompressed data makes primary storage a prime offender when it comes to power usage. In response, compression optimized for primary storage has become another innovative software technique for the green data center. Primary data compression from vendors such as Storwize does not impact performance. And in turn a compressed primary volume shrinks demand for secondary storage capacity and replication bandwidth.
Storage virtualization is another key strategy for the green data center. One of the most perplexing problems in creating a green data center is powering a plethora of storage devices, each requiring its own energy source and generating significant heat. Virtualization releases data from this stranglehold by presenting virtualized storage pools as separate devices to the applications.
Storage virtualization also includes best practices around managing large virtual storage pools, such as wide striping across multiple spindles for improved performance. The upshot of virtualization is that IT can efficiently pool physical storage capacity, which in turn improves allocation and shrinks energy and cooling costs.
Achieving a cost-effective and sustainable green data center is a major undertaking. However, reducing energy-related costs and managing data-center build-out is not an all-or-nothing task. Storage represents a large piece of the energy pie, and shrinking storage with innovative software technologies will have an immediate impact on the bottom line. By founding their green data center on this process, companies can realize significant reductions in storage-related energy costs. Then they can proceed to integrate other strategies, including managing power usage as an SLA, building an energy-efficient infrastructure, and engineering facilities for maximum power efficiency.
Christine Taylor is a research analyst at the Taneja Group (www.tanejagroup.com). For details on the report, "The Greening of the Data Center: A Four-Part Strategy to Achieve the Energy-Efficient Infrastructure," go to email@example.com.
Survey reveals 'green' initiatives (or lack thereof)
By Ann Silverthorn
According to an end-user survey, which included a section on "green" initiatives, nearly three-fourths of the respondents have an interest in adopting a green data-center initiative, yet only one in seven has successfully done so. For the purpose of the study, a green data center was defined as having increased efficiencies in energy usage, power consumption, and space utilization, and a reduction in polluting energy sources.
Conducted by Ziff Davis Enterprise on behalf of Symantec, the study surveyed 800 data-center managers across 14 countries, most of which were Global 2000 organizations and other large companies.
In the US, only about one-third of the companies have adopted green policies, which compares to 60% in the Asia-Pacific and Japan region, and 55% in Europe.
However, many US companies are making progress with the Green Grid, which is a consortium of IT vendors and users seeking to lower the overall consumption of power in data centers. The organization is chartered to develop platform-neutral standards, measurement methods, processes, and new technologies to improve energy efficiency.
Green Grid board members include AMD, APC, Dell, Hewlett-Packard, IBM, Intel, Microsoft, Rackable Systems, SprayCool, Sun, and VMware. Contributing members include nearly 30 companies, including storage vendors such as Brocade, Copan, EMC, Pillar Data Systems, QLogic, Symantec, Verari Systems, and Western Digital. And there are an additional 75 general members participating in the Green Grid.
In the Symantec survey, almost 85% of the respondents said that energy efficiency is at least a moderate priority in their data centers, with 15.5% citing it as a critical priority.
When considering approaches to making data centers greener, IT managers have many choices in both software and hardware. They may even decide on an entire data-center redesign. According to users, technologies such as data de-duplication and creating a tiered storage architecture are examples of technologies that can dramatically reduce energy consumption.
In addition, there are a variety of projects that constitute green policies. Among the most popular are software-based solutions such as server consolidation and server virtualization. In fact, 68% of the survey respondents said that reducing energy played a role in their decision to implement server consolidation and virtualization.