The constant “green” clarion call is all around us, with vendors from all areas of IT jumping on the bandwagon and marketing themselves and their products as eco-friendly. Green packaging, green facilities, green disk drives, green arrays, etc.
Of course, energy usage is a real problem because data-center energy costs are skyrocketing. This has spurred both challenges and advances across the IT continuum of servers, storage, networks, and facilities. Energy budgets are swelling as corporations use more and more energy to keep the lights on and the disks spinning. This is a large-scale problem, and the goal of the green data-center movement must be to control energy output on the same large scale.
There is no one fix for the problem. Instead, there are ongoing integrated measures designed to control energy consumption. First, there are a variety software technologies to optimize capacity and compress stored data. In addition to software-based solutions, a data center fully managed for sustainable low-energy usage might also include managing energy usage as SLAs, designing physical plants for highly efficient power and cooling, and improving hardware for maximum energy efficiency.
The centerpiece of the green action plan is the concept that less data requires less energy. It’s a deceptively simple statement, since shrinking data without sacrificing value can be tricky. However, there are innovative software technologies that will give you an excellent start. What constitutes an excellent start? Even without integrating energy SLAs, green facilities design, and hardware advances, shrinking stored data can equate to a savings of 50% or more without any loss of value. Add in the other three solutions and you can achieve up to 80% energy savings.
Growing storage system capacity is the time-honored method for housing more data. However, energy and cooling costs grow right along with capacity, making for unacceptably high energy costs and data centers that have reached the limit of their expansion. Instead of merely adding storage capacity, IT must optimize usable capacity on its existing storage systems. This alone will provide the data reduction needed to control storage-related energy issues-a significant offender when it comes to data-center energy costs.
Software-based technologies that can reduce energy and power consumption include data de-duplication, thin provisioning, primary data compression, and virtualization. There are other energy-saving technologies on both the software and hardware sides (i.e., tiering, consolidation, and sleepy drives), which we’ll address in a future article.
Data de-duplication is a key technology for optimizing storage capacity, particularly for file-based data on NAS appliances and virtual tape libraries (VTLs). It is a green technology because highly optimized storage systems house less data, which pulls less power and generates less heat.
De-duplication can result in massive capacity savings and attendant energy savings. However, vendors approach de-duplication differently. For example, some work only at the file level while others work at the sub-file level for even greater data reduction.
One main distinction revolves around the optimum location for the de-duplication process, which can be at the server level where the backup stream begins, inline to intercept the backup stream between server and storage, or post-processing at the storage level. Representative vendors include EMC and Symantec at the source backup stream level. Inline vendors include Data Domain and Diligent, and post-processing vendors include Sepaton and FalconStor. And Quantum has a hybrid appliance that provides inline and post-processing de-duplication.
Whatever de-duplication method you choose, the space savings can be startling: De-duped backups can produce a 25x reduction in backup data, although the de-duplication ratio depends on a number of factors, and your mileage may vary.
Thin provisioning is another method of achieving energy savings on primary block storage, because it allows IT to allocate more storage space to an application than is physically available. As non-intuitive as this may sound, it is an excellent approach to the all-too-common problem of over-provisioning storage.
The reality of the data center is that users and applications often request much more storage than the application will actually use in the short- to mid-term. The practice is not unreasonable given the corporate budgetary process (“use it or lose it”) and the need to constantly provide sufficient storage space for critical applications. But, the usual result of this space grab is over-provisioning, with utilization rates of 20% or less. Meanwhile 80% of usable capacity is simply sucking up power and cooling, as well as data-center real estate.
This is where thin provisioning comes in. Thin provisioning allows IT to allocate a larger virtual amount of capacity than the actual physical amount available. For example, administrators might virtually provision 100GB to an application while in fact physically provisioning only 10GB in the physical storage pool. As capacity requirements grow, thin provisioning can automatically release physical storage chunks.
IT can set capacity alerts so that if capacity limits are reached, then additional capacity is automatically provisioned. By decreasing over-provisioning and increasing disk utilization, IT avoids significant energy costs. It is not altogether that simple, of course. For example, some applications will grab storage anyway by automatically marking all allocated disk space with metadata to improve performance. Still, thin provisioning is very cost-effective for reducing over-allocation and accompanying energy costs.
Primary data compression and storage virtualization can also reduce energy and cooling requirements. Neither is new, but both are making significant inroads toward cutting down data-center costs.
Data compression in non-production environments has been around a long time. Compressed data saves disk space, speeds up data recovery using fewer backup sources, and saves on storage system and media purchases. And in the case of nearline (secondary) storage, compression uses less disk and thus cuts down on ongoing energy costs from constantly spinning disks. However, few companies have applied data compression to primary production environments, citing high-performance requirements that did not allow for compression/de-compression cycles.
But as interest grew in the benefits of the energy-efficient data center, vendors turned their attention to the high energy costs of primary storage. Most de- duplication and data compression happens at the secondary storage level, meaning that uncompressed data makes primary storage a prime offender when it comes to power usage. In response, compression optimized for primary storage has become another innovative software technique for the green data center. Primary data compression from vendors such as Storwize does not impact performance. And in turn a compressed primary volume shrinks demand for secondary storage capacity and replication bandwidth.
Storage virtualization is another key strategy for the green data center. One of the most perplexing problems in creating a green data center is powering a plethora of storage devices, each requiring its own energy source and generating significant heat. Virtualization releases data from this stranglehold by presenting virtualized storage pools as separate devices to the applications.
Storage virtualization also includes best practices around managing large virtual storage pools, such as wide striping across multiple spindles for improved performance. The upshot of virtualization is that IT can efficiently pool physical storage capacity, which in turn improves allocation and shrinks energy and cooling costs.
Achieving a cost-effective and sustainable green data center is a major undertaking. However, reducing energy-related costs and managing data-center build-out is not an all-or-nothing task. Storage represents a large piece of the energy pie, and shrinking storage with innovative software technologies will have an immediate impact on the bottom line. By founding their green data center on this process, companies can realize significant reductions in storage-related energy costs. Then they can proceed to integrate other strategies, including managing power usage as an SLA, building an energy-efficient infrastructure, and engineering facilities for maximum power efficiency.