By Heidi Biggar
—About five years ago, I wrote an article in the print version of InfoStor exposing one of IT's now-infamous dirty little secrets. Backup and recovery, as it was done then, was grossly inefficient, and IT was fully aware of it.
IT managers knew there were potentially significant gaps in their data-protection strategies. They knew their organizations' data wasn't being adequately protected, and they knew that if they were asked to recover data in an outage or disaster situation, their secret would likely be revealed. At best, recovering data was a time-consuming process; at worst, it was an exercise in futility.
The problem was that IT administrators didn't know which way the pendulum would swing—and aside from scrolling manually through pages and pages of backup logs, which were growing daily as backup volumes increased, they had no way of knowing just how much at risk they were. IT departments resorted to backing up the same data over and over (to tape) with the hope that doing so would decrease their overall risk. But as data volumes grew and the inherent value of this data became more visible within organizations, the problem worsened.
Fortunately, due to the pervasiveness of disk-based backup, improvements in data management and backup reporting capabilities, and a variety of other new technologies (e.g., data de-duplication), data protection is very different today—and significantly better.
IT administrators today have much greater visibility into, and control of, their backup-and-recovery environments than they did just a few years ago. They not only can see what's going on in their data-protection environments and ensure backup jobs are completed in allotted windows, but they can also preemptively avoid backup failures by identifying problems within the network and on disk drives (in disk-based backup systems).
Beyond that, advances in disk-based backup technologies and associated software have evolved to the point where IT administrators can choose from a wide array of data-protection technologies (sometimes from a single vendor) to ensure the appropriate level of protection is applied to data according to its importance and/or rate of change. IT administrators might opt to protect frequently changing, mission-critical data (that has an RPO and RTO of zero) with continuous data protection (CDP) technology and less-critical data with a virtual tape library (VTL) or nearline disk appliance, for example. Applying CDP-level protection to all backup data regardless of its value is overkill—and is costly from a capacity and dollar standpoint.
Our conversations with end users and ESG Research's studies of disk-based backup adoption show that IT administrators understand the value of implementing tiered data-protection environments. But what's interesting is that many of these administrators haven't applied this concept at the primary storage level. They've got a lot of data sitting on primary storage systems that shouldn't be. This is IT's new "dirty little secret."
ESG estimates that 60% to 80% or more of the data on primary storage systems today is static (or persistent). In other words, this data has not been accessed at all 90 days or more after its creation. It is non- or post-transactional, doesn't belong on Tier-1 storage, and doesn't need to be continually backed up.
By moving fixed content off expensive primary storage systems and onto lower-cost secondary storage, organizations can significantly reduce capital and operational costs while still ensuring high availability, security, and quick access to data.
The capital-savings potential comes from leveraging lower-cost storage systems and freeing up primary disk capacity for true primary application data (and hence eliminating or postponing future storage systems purchases). The operational savings stem from lower management costs through applying different data management processes from Tier-1 storage to Tier-2 storage.
The idea is also to take business-vital persistent data that may traditionally have ended up on tape and store it on appropriate secondary storage tier devices, where it is readily accessible in regulatory, corporate, or e-discovery situations. This tier could be almost anything, including SAN, NAS, CAS, MAID, or even a disk-based backup appliance (NAS or VTL). This tier could also be optical-based, or a combination of several technologies. You should consider using what ESG refers to as Enhanced Tiered Storage (ETS), which encompasses storage systems that provide capacity-efficient technologies such as thin provisioning, data de-duplication, logical snapshots, and writeable snapshots.
We all know that recovering data from tape is at best a lengthy process with potentially costly ramifications, and at worst impossible with potentially devastating business consequences. From a media management perspective, secondary storage technologies are also much less expensive than tape-based alternatives. In this type of environment, there are no tapes to keep track of.
From a pure power and cooling standpoint, there are many efficiencies to be gained by moving data off primary disk resources that keep disks spinning continuously and onto more-efficient secondary storage systems. With electricity running at an average cost of 9.28 cents per kilowatt-hour, that can quickly translate into significant cost savings. Factor in data de-duplication, and the savings is magnified.
Heidi Biggar is an analyst with the Enterprise Strategy Group.
Statistics tell the story
If you're unsure about the need for building a tiered storage environment, consider the following statistics:
- 56% of data recovered is less than two days' old. (Source: ESG Research, The Evolution of Enterprise Data Protection)
- 90% of data created is static 90 days after its creation. (Source: Storage Networking Industry Association)
- Digital archive capacity will increase nearly 10-fold between 2005 and 2010 to more than 25,000 PB. (Source: ESG Research, Digital Archiving: End-User Survey & Market Forecast, 2006?2010)
- 60% to 80% or more of the data on production arrays is non-transactional, or post-transactional, and at least half of this non-transactional data is a replica of non-transactional data (Source: ESG)
- Data-center electricity consumption increased 97% from 2000 to 2005 and now accounts for 1% to 2% of all US electricity consumption. (Source: Estimating Total Power Consumption by Servers in the U.S. and the World, Jonathan G. Koomey, Ph.D.)