InfoStor Article Categories:
![]() |
![]() |
|
|
![]() |
|
InfoStor Online Article
|
|||||||||||||||||||||||||||||||||||
Data de-duplication: Questions and answers Eight questions that every IT organisation should ask about data de-duplication before it deploys or upgrade. By Heidi BiggarData de-duplication is arguably one of the most important new technologies to hit the storage market in years, and it’s a game-changing technology that can have an immediate impact on end-user environments. By reducing the amount of physical disk capacity that is needed to store information, data de-duplication allows organisations to keep more information on disk-based systems—making it more accessible to the people and applications that need it. For those of you who are already de-duplicating their secondary data, de-duplication immediately paves the way for wider application use inside the data centre and, equally important, at remote sites (see figure). While there is no doubt that data de-duplication will play a large role across all classes of storage moving forward, some important technological and business considerations remain when you’re evaluating potential products. Addressing these factors will go a long way towards ensuring a “best fit.” This article identifies eight technology-related questions every organisation needs to ask. 1. What data de-duplication ratios can I expect?While ratios of 50:1, 100:1, 200:1, and higher are possible, we’ve found through conversations with end users, recent ESG Research, and hands-on ESG Lab testing that ratios of 10:1 to 20:1 are more typical (see figure on p. 15). Data reduction ratios depend on a number of variables, including the type of data being backed up or stored, retention periods, the frequency of full backups, and the specific data de-duplication technology being used. To get an idea of the ratios you can expect in your environment, we encourage organisations to provide potential vendors with detailed information about their environments, backup processes, applications, retention SLAs, and data types. 2. Will data de-duplication affect my existing backup-and-restore performance? And if so, how?This is an important question to ask, especially when you consider that one of the primary objectives of implementing disk-based backup solutions is to improve overall backup-and-restore/recovery performance. In many cases, performance will depend on factors such as the backup software that is used, as well as the systems and networks that support it, so it’s important to ask a couple of follow-up questions:
Data de-duplication can provide significant efficiencies within the backup environment, but it is not a panacea. If the backup environment is “broken,” it is unlikely that data de-duplication alone will fix it. Existing system and network capabilities, as well as bottlenecks, must be factored in. Data de-duplication may allow more backup data to go to disk versus tapebut it won’t fix a poor overall design or implementation. As for any performance impact from the de-duplication solution itself, some performance degradation can be expected with inline approaches (since data is being de-duplicated in the data path as it is being ingested). The actual impact depends on a number of variables, including the de-duplication technology itself, the size of the backup volume, the granularity of the backup process, the aggregate throughput of the architecture, and the scalability of the solution.
Of course, there are also trade-offs to doing the de-duplication process post-process (or out of the data path after the data has been ingested)—notably, the capacity reserve that is initially needed to store the full backup job before it is de-duped. And there are other “performance-related” issues, including the disaster-recovery (DR) window (see “De-duplication trade-offs: Recap,” below). However, performance issues of any kind are only relevant if they actually occur. Users won’t get any benefit from paying more for a faster solution when a less costly one handles the job just fine. 3. How will data de-duplication impact my DR window?The effect that de-duplication can have on an IT organisation’s DR windows is an important consideration—one that can have significant implications depending on the specific environment. De-duplication benefits—such as increased retention and lower tape costs—are important, but their value can quickly erode if the de-duplication process is difficult to use or if it impacts DR readiness. This is where “time to protection” (T2P) comes into play. T2P refers to the time it takes to get application data backed up and moved off-site for DR. The length of this process—from start to finish—depends on the data de-duplication approach (inline versus post-process) and the speed of the de-duplication architecture, as well as the DR method (e.g., is the data written or exported to tape, or is it de-duplicated and then replicated over a WAN to an off-site location?). It’s important to sketch out this process, assigning time values to each leg of the process. Doing so will help ensure organisations aren’t exposed in the event of a disaster. Reclaiming disk space is great, but it shouldn’t come at the expense of T2P. 4. Is de-duplicated remote replication supported?De-duplication remote replication support will become more and more important over time (see “De-duplicated replication: A hidden jewel,” p. 16). Minimising the amount of redundant data moved over the WAN reduces overall network traffic—allowing users to enable, improve, or even expand disaster Today, remote replication means different things to different people. As a rule of thumb, any product that supports “multi-site de-duplicated remote replication” should be able to de-duplicate data across the entire storage environment—i.e., at each remote site and again at the central site. This type of functionality is not widely supported by disk-based backup vendors today, so if it is a requirement for your organisation, make sure that if it’s not currently supported by your vendor, then it’s at least on the road map. 5. Is it easy to implement and use data de-duplication?One of the compelling things about de-duplication is that it is easy—or at least it should be—and this should hold true for both small- and large-scale installations. It should be invisible to the backup-and-recovery process, and it should be combined with disk backup solutions (e.g., purpose-built disk backup appliances, virtual tape libraries, or VTLs, etc.) that are also easy to use and implement. IT organisations should also have the flexibility to turn de-duplication “on” or “off” depending on network demands, user environments, data types, etc. Make sure to ask vendors for references! 6. How am I protected from data loss or corruption?This is a very important question on a couple of different levels. It applies to both the disk backup system itself and the de-
Second, if the system de-duplicates data, then you need to find out what the system does if the source data becomes corrupt or inaccessible for some reason. After all, there may be 1,000 backup images that rely on a single copy of source data. 7. How scalable is the solution?Again, this question applies to the disk backup solution itself, as well as the de-duplication technology. Make sure that you size your environment to meet current capacity and performance requirements, but also consider future demands. Choose a vendor that will make it easy for you to scale in terms of technology and cost. Also, make sure to ask vendors about any performance considerations for both their systems and de-duplication technology as their 8. What types of applications are supported?Flexible application support may not seem like a big deal initially, but as environments scale and more data types and sites are added, it becomes increasingly beneficial, if not critical, for de-duplication solutions to support multiple applications. In particular, it’s important that these solutions support multiple backup applications and preferably have the capability to de-duplicate and store other types of persistent data in the same system. The greater the flexibility of these systems, the more consolidation is possible using less physical infrastructure. This in turn reduces cost in terms of management, purchasing, and energy consumption. The above questions are important, but they only cover technology considerations. There is also the business side to consider. One of the greatest attributes of data de-duplication is that its value is easy to quantify. It is relatively easy to put a dollar amount on the cost savings of reducing the amount of capacity needed to store backup data by 10:1, 20:1, or greater. While these numbers can be significant and may be enough for some organisations to move forward, they only tell part of the data de-duplication cost-savings story. A complete return-on-investment (ROI) analysis should include both the hard and soft cost savings of deploying de-duplication. In fact, the soft costs alone—the value of increased retention, operational efficiencies, and time to protection can be very compelling. Finally, you should remember that while you may gain a 50:1 advantage on Day 1, new data will be added over time, and sooner or later you’ll be right back to where you started in terms of capacity under-management. Data growth is the primary cause of many of the issues IT professionals face—and it causes downstream issues at every layer. Data protection is the easiest area in which to justify deploying de-duplication since it affects only “copied” information. However, de-duplication will eventually play a role at every point of the data lifecycle, as the benefits of “less” are clear at each level. The sooner you start implementing de-duplication— Heidi Biggar is an analyst with the Enterprise Strategy Group research and consulting firm (www.enterprisestrategygroup.com).
|
|||||||||||||||||||||||||||||||||||
![]() |
So, while data de-duplication’s initial foothold is within the walls of the data centre, it’s a natural—and easy—progression to roll it out remotely over time, as comfort levels with the technology increase. This has a number of potential significant benefits for end users, including the following:
From a technology standpoint, it’s the same premise: A data de-duplication engine analyses and removes redundant data blocks before they are moved over the network. Again, the level of de-duplication users can expect to see depends on the implementation as well as the flexibility of the replication process (i.e., whether or not it does true “multi-site” de-duplication).
Page 1 of 1
|
From the Wires
|
||||||||||
|
||||||||||
|
|
Sponsored White Papers | ||
|
|
|
|