BY GREG SCHULZ
There are many approaches to addressing power, cooling, floor space (or footprint), and environmental (PCFE) aka "green" issues pertaining to data storage. A key consideration for enabling a green and virtual data center storage environment is to achieve economic and energy efficiency, including a balance between performance, availability, capacity and energy (PACE) to a given quality of service (QoS) level. There are many different approaches, techniques, and technologies, some of which are briefly discussed in this article.
General steps to doing more with your storage-related resources without negatively impacting application service availability, capacity, or performance include:
- Assess and gain insight as to what you have and how it is being used;
- Develop a strategy and plan (near-term and long-term) for deployment;
- Use energy-effective storage solutions (both hardware and software);
- Optimize data and storage management functions;
- Shift usage habits to allocate and use storage more effectively;
- Reduce your data footprint and the subsequent impact on data protection;
- Measure, reassess, and adjust the capacity management process.
Approaches to improve storage PCFE efficiency include:
- Spin down and power off hard disk drives (HDDs) when not in use;
- Reduce power consumption by putting HDDs into a slower mode;
- Do more work and store more data with less power;
- Use flash, RAM, and solid-state disk (SSD) drives;
- Consolidate to higher-capacity storage devices and systems;
- Use different RAID levels and tiered storage to maximize resource usage;
- Leverage management tools and software to balance resource usage;
- Reduce your data footprint via archiving, compression, and data de-duplication;
- Mask or move data to cloud-based or managed service providers.
Tiered storage includes different types of HDDs, such as high-performance Fibre Channel and SAS drives/arrays and lower- performing, high-capacity SATA SAS, or Fibre Channel drives/arrays. Other types of tiered storage devices include magnetic tape, optical media (CDs, DVDs, magneto-optical) and flash-based SSD drives (see figure, "Tiered storage options").
Given that data storage spans categories from active online and primary data to offline and infrequently accessed archive data, different types of storage media addressing different value propositions can be found in a single storage solution. For example, to address high-performance active data, the emphasis is on work per unit of energy at a given cost, capacity, and physical footprint. For offline or secondary data not requiring high performance, the focus shifts from energy efficiency (doing more work per unit of energy) to capacity density per cost, unit of energy, and physical footprint.
Compare different tiered storage media based on what applications and types of data access they will be supporting, while considering cost and physical footprint. Also con- sider the performance, availability, capacity, and effective energy efficiency for the usage case, such as active or idle data. As an example, a 3.5-inch, 146GB, 15,000rpm, 4Gbps Fibre Channel or SAS HDD consumes the same, if not less, power than a 3.5-inch, 750GB, 7,200rpm SATA or SAS HDD. For active online data, the 15,000rpm HDD delivers more performance per unit of energy than the larger-capacity SATA HDD. For capacity-intensive applications that do not need high performance, however, the SATA drive has better density per unit of energy in the same physical footprint as the faster 146GB HDD. Which drive to use depends on the application; increasingly, a mix of high-speed Fibre Channel or SAS drives are configured in storage systems with some lower-performing, high-capacity (or “FAT”) HDDs for a tiered storage solution in a single box.
The need for more effective I/O performance is linked to the decade old (and still growing) gap between server and storage performance, where the performance of HDDs has not kept up with the performance increases in servers.
Reducing energy consumption is important for many IT data centers. Although there is discussion about reducing energy by doing less work or powering down storage to reduce energy use, the trend is toward doing more with less power per unit of work. This includes intelligent power management when power consumption can be reduced without compromising application performance or availability, as well as doing more I/Os per second (IOPS) or bandwidth per watt of energy.
Flash is relatively low-cost and persistent memory that does not lose its content when power is turned off. USB thumb drives are a common example. DDR/RAM is dynamic memory that is very fast but is not persistent (i.e., data is lost when power is removed). DDR/RAM is also more expensive than flash memory. Hybrid approaches combine flash for persistency, high capacity, and low cost with DDR/RAM for performance.
There is a myth that SSDs are only for databases and that SSDs do not work well with file-based data. The reality is that, in the past, given the cost of DRAM-based solutions, specific database tables or files, indices, log, or journal files, or other transient performance-intensive data, were put on SSDs. If the database was small enough or the budget large enough, the entire database may have been put on SSDs. Given the cost of DRAM and flash, however, many new applications and usage scenarios are leveraging SSD technologies. For example, NFS filer data access can be boosted using caching I/O accelerator appliances or adapter cards.
More SSDs are not in use because of the perceived cost. The thought has been that SSDs in general cost too much compared to HDDs. When compared strictly on a cost per gigabyte basis, HDDs are cheaper; however, if compared on the ability to process I/Os and the number of HDDs, interfaces, controllers, and enclosures necessary to achieve the same level of IOPS or bandwidth, then SSDs may be more cost-effective for a given capacity usage case. The downside to RAM compared to HDDs is that electrical power is needed to preserve data.
RAM-based SSDs have addressed data persistence issues with battery-backed cache or in-the-cabinet uninterruptible power supply (UPS) devices to maintain power to memory when primary power is turned off. SSDs have also combined battery backup with internal HDDs, where the HDDs are stand-alone, mirrored, or parity-protected and powered by a battery to enable DRAM to be flushed (destaged) to the HDDs in the event of a power failure or shutdown. While DRAM-based SSDs can exhibit significant performance advan- tages over HDD-based systems, SSDs still require electrical power for internal HDDs, DRAM, battery (charger), and controllers.
Flash-based memory has risen in popu- larity because of its low cost per capacity and because no power is required to preserve the data on the medium. For example, flash memory has become widespread in low-end USB thumb drives and MP3 players.
The downsides to flash are that its performance, particularly on writes, is not as good as that of DRAM and, historically, it has a limited duty cycle in terms of how many times the memory cells can be rewritten or updated. In current enterprise-class flash memory devices, however, the duty cycles are much longer than in consumer-based flash products.
The best of both worlds may be to use RAM as a cache in a shared storage system combined with caching algorithms to maximize cache effectiveness and optimize read-ahead, write-behind, and parity technology to boost performance. Flash-based storage systems have data persistence as well as lower power consumption and improved performance compared to all-HDD storage systems.
Even with the continuing drop in prices and increases in capacity of DDR/RAM and flash-based SSDs, for most IT data centers and applications, there will continue to be a need to leverage tiered storage, including HDD-based storage systems. This means, for instance, that a balance of SSDs for low-latency, high-performance hotspots along with high-performance HDDs in the 146GB and 300GB, 15,500rpm class are a good fit with 500GB, 750GB, and 1TB HDDs for capacity-driven workloads.
For active storage scenarios that do not require the ultra-low latency of SSDs but need high performance and large amounts of affordable capacity, energy-efficient 15,000rpm Fibre Channel and SAS HDDs provide a good balance between activity per watt (e.g., IOPS per watt and bandwidth per watt) and capacity, as long as the entire capacity of the drive is used to store active data. For dormant data and ultra-high capacity environments with a tolerance for low performance, higher-capacity 750GB and 2TB "FAT" HDDs that trade I/O performance for greater capacity provide good capacity-per-watt metrics.
IPM, APM and MAID
Intelligent power management (IPM)—also called adaptive power management (APM) and adaptive voltage scaling (AVS)—applies to how electrical power consumption and, consequently, heat generation can be varied depending on usage patterns. Similar to laptops or PC workstations with energy-saving modes, one way to save on energy consumption in large storage systems is to power down HDDs when they are not in use. That is the basic premise of MAID, which stands for massive (or monolithic or misunderstood) array of idle (or inactive) disks.
MAID-enabled devices are evolving from first-generation MAID 1.0, in which HDDs are either on or off, to a second generation (MAID 2.0) implementing IPM. MAID 2.0 leverages IPM to align storage performance and energy consumption to match the required level of storage service, and is being imple- mented in traditional storage systems. With IPM and MAID 2.0, instead of an HDD being either on or off, there can be multiple power-saving modes to balance energy consumption/savings with performance requirements.
By more effectively managing the data footprint across different applications and tiers of storage, it is possible to enhance application service delivery and responsiveness as well as facilitate more timely data protection to meet compliance and business objectives. To realize the full benefits of data footprint reduction, look beyond backup and offline data improvements and include online and active data. Data footprint reduction techniques include archiving (both compliance and non-compliance data); on-line, real-time and streaming off-line compression; and data de-duplication.
Data de-duplication is getting a lot of attention and is being adopted primarily for use with backup and other static data in entry-level and mid-range environments. De-duplication works well on data patterns that are seen as recurring over time—that is, the longer a de-duplication solution can look at data streams to compare what it has seen before with what it is currently seeing, the more effective the de-duplication ratio can be. Consequently, it should be no surprise that backup in small to medium-size environments is an initial market "sweet spot" for data de-duplication, given the high degree of recurring data in backups.
The challenge with de-duplication is the heavy "thinking" needed to look at incoming data or data that is being ingested, and determine if it has been seen before, which requires more time, intelligence, or processing activity than traditional real-time compression techniques. While the compression ratios can be larger with de-duplication across recurring data than with traditional compression techniques, the heavy thinking and associated latency or performance impacts can be larger as well.
A hybrid approach for data de-duplication is policy-based de-duplication, which combines the best of both worlds and provides IT organizations with the ability to tune de-duplication to quality of service (QoS) needs. Policy-based data de-duplication solutions provide the flexibility to operate in different modes, including reducing duplicate data during ingestion, on a deferred or scheduled basis, or selectively turning de-duplication off and on as needed.
The benefit of policy-based de-duplication is flexibility to align performance, availa- bility, and capacity to meet different QoS requirements. For example, for performance- and time-sensitive backup jobs that must complete data movement in a given timeframe, policy-based de-duplication can be enabled to reduce backup windows and avoid performance penalties.
Thin provisioning can be described as similar to airlines overbooking a flight based on history and traffic patterns; however, like airlines overbooking flights, thin provisioning can result in a sudden demand for more real physical storage than is available. In essence, thin provisioning, as shown in the figure on page 29, allows the space from multiple servers that have storage allocated (but not actually used) to be shared and used more effectively to minimize disruptions associated with expanding and adding new storage.
In the figure (page 29), servers think that they have, for example, 10TB allocated, yet many of the servers are only using 10% (or, about 1TB) of storage. Instead of having to have 5 × 10TB (or, 50TB) underutilized, a smaller amount of physical storage can be deployed, yet thinly provisioned, with more physical storage allocated as needed. The result is that less unused storage needs to be installed and consume power, cooling, and floor space until it is actually needed. The downside is that thin provisioning works best in stable or predictable environments, where growth and activity patterns are well understood or good management insight tools on usage patterns are available.
There are many approaches to addressing PCFE, or green, storage issues. Keep performance, availability, capacity, and energy (PACE) in balance to meet application service requirements, and avoid introducing performance bottlenecks in your quest to reduce or maximize your existing IT resources, including power and cooling.
GREG SCHULZ is founder of the StorageIO Group (www.storageio.com), an IT industry analyst and consulting firm, as well as author of Resilient Storage Network (Elsevier) and The Green and Virtual Data Center (CRC; www.thegreenandvirtual datacenter.com).