InfoStor Article Categories:
![]() |
![]() |
|
|
|
|
![]() |
|
InfoStor Online Article
|
|||||||||||||||||||||||||||||||
The benefits of data footprint reduction Users should consider archiving, data compression, and data de-duplication. By Greg Schulz August 15, 2007—Organizations of all sizes are generating and depending on larger amounts of data that must be readily available and easily accessible. This growth in data results in an ever-increasing data footprint: More data is being generated, copied, and stored for longer periods of time. Consequently, IT organizations have to effectively manage more infrastructure resources, including servers, networks, and storage, to ensure data is protected in a timely manner while at the same time providing adequate performance and capacity and securing data for access when needed. Your data footprint is the total storage capacity needed to support your various business applications and information needs. Your data footprint may, in fact, be larger than how much actual storage you have, or, as in the following example, you may have more aggregated data storage capacity than actual data. As an example, say you have 2TB of Oracle database instances and associated data, 1TB of Microsoft SQL Server data, 2TB of Exchange e-mail data, and 4TB of shared NFS and CIFS file-sharing storage, resulting in 9TB of data; however, your actual data footprint could be much larger. The 9TB simply represents the known data or how storage is allocated to different applications and functions. If the databases are sparsely populated at 50%, for example, only 1TB of Oracle data actually exists while occupying 2TB of storage capacity. Assuming, for now, that in the above example the capacity sizes mentioned are fairly accurate to the actual data size based on how much data is being backed up during a full backup, your data footprint would include the 9TB of data as well as the online (primary), nearline (secondary), and offline (tertiary) storage configured to your specific data protection and availability service requirements. For example, if you are using RAID-1 mirroring for data availability and accessibility, in addition to sending your data asynchronously to a second site where the data is protected on a RAID-5-based volume with write cache, as well as a weekly full backup, then your data footprint would be at least 37TB (9 x 2 for RAID 1) + (9+1 for RAID 5) + (9 for full backup). Your data footprint could be even higher than 37TB in this example if you also assume that daily incremental or periodic snapshots are performed, in addition to the extra storage required to support application software, temporary work space, and operating system files, etc. As can be seen from this example, 9TB of actual or assumed data can rapidly expand into a much larger data footprint. Note that the above scenario is rather simplistic and does not factor in how many copies of duplicate data may be being made or backup retention, size of snapshots, free space requirements, and other items that contribute to the expansion of your data footprint. Reducing the data footprint Reducing your data footprint can help reduce costs and allow you to defer upgrades to expand server, storage and network capacity, along with associated software license and maintenance fees. Maximizing what you already have by using data footprint reduction techniques can extend the effectiveness of your existing IT resources, including power, cooling, capacity, network bandwidth, replication, backup, archiving, and software license resources. From a network perspective, by reducing your data footprint or its impact, you can also positively impact SAN, LAN, MAN, and WAN bandwidth for data replication and remote backup or data access, as well as move more data with existing bandwidth. Additional benefits of maximizing the usage of your existing IT resources include
IT organizations have taken different approaches to address the challenges associated with a growing data footprint while balancing service delivery (performance, availability, capacity, compliance) with cost, including operating expense (OPEX) and capital expense (CAPEX), while ensuring compliance and business continuance (BC) or disaster-recovery (DR) requirements are being met. While DR and compliance have been in the news recently, along with data security, another topic that is gaining attention is "green" storage and IT infrastructure—specifically, reducing power and cooling costs. For some organizations, the solution to reducing data footprint involves restricting the use of storage. Examples include limiting database size and/or placing restrictions on e-mail box size and user disk space quotas. While limits and quotas can have their place, their implementation should not hinder users' productivity. Another approach is to simply add more hardware. After all, disk prices continue to drop rapidly. However, bear in mind that while disk hardware can be relatively inexpensive, it still requires software and management, including backup and other functions, which result in personnel and other "soft" costs. Three approaches to reducing the data footprint include archiving, data de-duplication, and data compression. Archiving unused data A challenge with archiving is having the time and tools available to identify what data should be archived and what data can be securely destroyed when no longer needed. Further complicating archiving is that knowledge of the data value may also be needed; this may well include legal issues as to who is responsible for making decisions on what data to keep or discard. If you can invest in the time and software tools, as well as identify which data to archive to support an effective archive strategy, then the returns can be very positive toward reducing your data footprint without limiting the amount of information available to your business. SIS and data de-dupe The benefits of pointer-based PIT snapshots are speed of data protection and less storage required for rapid retrieval of data. SIS approaches trade processing time to ingest and eliminate duplicate data for a savings on storage capacity to store backed-up data. This assumes there is a high degree of commonality and repeating data files being backed up. Consequently, SIS and data de-duplication solutions perform best when deployed in support of backup operations, and to a lesser degree for archiving. Data de-duplication may not be practical for online applications today. Some SIS-enabled solutions, such as virtual tape libraries (VTLs), also combine data compression to further reduce data footprint requirements. Data compression Some data de-duplication solutions boast spectacular ratios for data reduction, given specific scenarios such as backup of repetitive files, while providing little value over a broader range of applications. This is in contrast to data compression approaches that provide lower, yet more-predictable and consistent data-reduction ratios, over more types of data and applications, including primary storage. For example, in environments where there is little or no common or repetitive data files, data de-duplication will have little to no impact while data compression generally will yield some amount of data footprint reduction across almost all types of data. Some data de-duplication vendors have either already added, or have announced plans to add, compression techniques. Data footprint reduction There are many different attributes to consider when evaluating data footprint reduction technologies. Which features are the most important for you will depend on your environment and requirements. One issue to consider is how much delay or resource consumption you can afford to use or lose to achieve a given level of data footprint reduction. For example, as you move from coarse (traditional compression) to granular (data de-duplication) technologies, more intelligence, processing power, or offline post-processing techniques are needed to look at larger patterns of data to eliminate duplication. Similarly, understand what delays may occur as a result of using SIS-based data footprint reduction techniques during large-scale bulk data restorations. You may want to consider a data footprint reduction strategy that combines various technologies to address specific applications as well as your overall environment, including online, nearline backup, and offline archiving. Following are some general recommendations and suggestions to help address your growing data footprint, all of which depend on the size and scope of your particular environment, applications, and service requirements.
There are several different techniques that can be used individually to address specific data footprint reduction issues, or those techniques can be used in various combinations to implement a more comprehensive and effective data footprint reduction strategy. The benefit of a broader, more-holistic, data footprint reduction strategy is to address your overall environment, including all applications that generate and use data as well as overhead functions that impact your data footprint. Reducing your data footprint has many benefits, including maximizing the usage of your IT infrastructure resources such as power and cooling, storage capacity, and network bandwidth, while enhancing application service delivery in the form of timely backup, BC/DR, performance, and availability. Look to combine technologies and techniques to address your various data footprint challenges, and to maximize your IT resources while reducing management costs and complexity. Greg Schulz is founder and senior analyst of the StorageIO Group and author of the book, Resilient Storage Networks—Designing Flexible Scalable Data Infrastructures (Elsevier Digital Press). Page 1 of 1
|
|
||||||||||||||||||||||||||||||
|
|
|