The Benefits of Data Replication/Sharing
Data replication/sharing provides a number of benefits to large enterprises–but there are a number of hurdles to clear.
By George Mele
Recent advances in and the rapid deployment of applications have increased database system use and created new issues surrounding the implementation of database management systems. The increasing popularity of these systems is driving the growth and complexity of storage across MVS, UNIX, and NT, as are numerous applications that further complicate storage and information management between central offices and remote sites. Common examples include:
– The need by business units for new applications to be in place at ever-increasing speeds
– Regional support personnel in home offices hooked up via WANs or T1/T3/E3 connections
– Explosion of data marts and data warehouses and the need for realtime information
– Increasing need for recovery sites and procedures and reduced backup windows (e.g., backing up source database through replication to target databases while on-line database users experience no downtime)
– Mergers and acquisitions combining different databases and applications
Traditional database management systems (DBMSs) were not designed with replication and data sharing in mind. A typical DBMS requires direct access to the database to read, write, and update data. However, a growing number of knowledge workers do not have continuous direct access to corporate databases to make business-critical decisions.
For example, an executive traveling in California cannot run a DB2-based application if the files are on a server in Boston. Likewise, a salesperson often cannot connect to the corporate database from a customer site to provide up-to-date product information and pricing. In some situations, wide-area networks (WANs) provide access, but in many cases, they are either too slow or too costly or simply not available or convenient.
To add complexity to the problem, applications may need to exchange information with more than one type of database. Today, there are more than 20 different third-party relational or hierarchical database systems as well as an infinite number of flat-file and proprietary database systems.
A true solution should provide access to necessary systems and should be capable of exchanging data among diverse databases and storage devices, thereby enabling knowledge workers to make fast, accurate decisions with immediate benefits to the business unit.
An effective solution to these problems is a technique called data replication/ sharing, which uses an open architecture so that companies can:
– Copy data from a source database to a target database.
– Create separate client databases that can be read and modified by third-party or proprietary applications.
– Migrate changes made in the separate client databases to the source database and vice versa, as often as required.
– Implement an easy-to-use GUI that makes data sharing simple, safe, and cost-effective, while fitting within an organization`s current infrastructure.
Data replication/sharing not only empowers knowledge workers, but drives the need for additional storage at both the source and target databases.
Despite the additional investment, more and more companies are turning to on-line data replication/sharing because the cost of storage is coming down at over 25% per year.
What Is It?
When using data replication/sharing, all or part of one or more source databases is replicated, according to the data requirements of specific users or user groups. For example, a sales force might need customer names and addresses from one database and product names and prices from another. Other information contained in the source databases, such as customer payment terms or product pricing lists, need not be propagated (replicated) if they are not required by the salespeople.
Users access this information via remote and local computers, reading and modifying data using commercially available or custom applications. Data replication/sharing allows users to migrate changes at any time from their targets to the source database or from the source database to their targets. This form of bi-directional replication is becoming more prevalent, driving the need for reliable and robust storage
Data replication/sharing does not take the place of transaction mon- itors and two-phase commits. But it does make those tools more powerful by plac- ing the most frequent- ly used information closer to the user for increased performance and lower network bandwidth.
Data replication/ sharing takes ad- vantage of the fact that direct access to the source database is not permitted for a variety of reasons in many applications.
For example, salespersons could read the source databases every morning to create up-to-date sales information. During the day, they could quote prices to customers and enter orders. At the end of the day, they could update the source databases with new orders and update any price changes that may have occurred at the corporate site during the day. This is fine for users who do not require instantaneous response times or do not continuously depend on corporate data throughout the day.
Other types of users may require more or less frequent access to source databases. Data replication/sharing lets system and database administrators control the frequency of updates between the targets and the source databases. And equally important, data replication/sharing specifies how changes are to be propagated between source databases and target databases in order to balance frequency, performance, availability, and functionality.
Heterogeneous Database Formats
Data replication/sharing does not require the format of the subset database to be the same as that of the source database. In fact, in some cases, this isn`t even possible.
If the source database format is an IBM DB2 database, for example, users won`t have access to OS/390 machines or to the same RDBMs. If a branch office requires certain information from the corporate database, replication is the only solution that can provide the security and response times to satisfy that knowledge worker.
Of course, replication requires additional storage for storing source copies prior to the replication and for the target(s) specified by the system or database administrator.
In cases where the systems are co- resident (in the same data center), a shared storage architecture may be used, but data conversions must still be done by intelligent middleware, whether or not the data is host- or storage-based.
A host-based approach is attractive because it is platform- and storage- independent, which allows users to build the best solution using best-of-breed components. Sharing information between MVS, UNIX, and NT across different storage devices that may have varying performance, availability, and pricing characteristics places the user in control and tends to be more cost-effective.
Choosing a storage-based sharing method may make the buying and installation process less complex, but it locks the company to that vendor, which can lead to premium prices for years if warranty, support, and upgrades are included.
Be aware of the short- and long-term benefits. Some companies may need a combination of host-based and storage-based replication/sharing capabilities.
Data Conversion Procedure
One problem with dealing with diverse database formats from numerous vendors is data type incompatibility. Proper data replication/sharing ensures compatibility in data types between different database formats.
The right data replication/sharing engine automatically converts data from one format to another as necessary. Data integrity and in some cases encryption are important, especially when replicating outside the firewall.
This conversion drives storage requirements at both the source and the target level. The amount of additional storage needed depends on the frequency and the amount of data required to replicate/share.
Synchronizing Source/ Target Databases
The most difficult problem handled by data replication/sharing is synchronizing source and subset databases. Synchronization is the process of updating the source database to reflect changes to the target database and updating the target database with changes made to the source database.
In the simplest situation, only one of the databases (the target or the source) changes between the creation of the target and its subsequent synchronization with the source database. In such cases, synchronization is a simple matter of overwriting the unchanged values with the changed values.
However, most situations are more complex, since the data in both databases may change. In these situations, synchronization must resolve conflicts that arise when data items in both databases have changed since the last synchronization. The particular implementation used by data replication/sharing specifies how these data synchronization conflicts should be resolved for specific data items in each database. In this case, working with each database is important and the application must not be affected.
To ensure information integrity, proprietary storage-based implementations must be extremely careful to avoid conflict with either the databases or the application that depends on those databases. Host-based implementations face the same issues, but are typically integrated closer to the operating system, databases, and applications.
For example, if a specific order amount was modified in the tar- get database, the target definition might specify that the changed order amount replaces the original order amount in the source database.
In contrast, if the number of units in inventory was de- creased in the target database, the source database should be decreased by the same amount rather than replaced with the subset value. Increasing or decreasing the number of units in inventory, rather than overwriting each new value, would allow the changes made by different users to cumulatively affect the source database.
Data replication/sharing does not limit synchronization to modified data. The data replication/sharing engine must also reconcile any new or deleted data in the target and source databases.
Asynchronous Updates
While it is possible to choose a single point in time when the changes to the source database are propagated to the target database and vice versa, this is not the only requirement for effective data replication/sharing. Changes from the target database might be applied to the source database frequently, while changes from the source database might be applied to the target database less frequently.
For example, orders may be sent into the corporate office daily, while changes to product prices might occur only once a week.
When updating asynchronously, the data replication/sharing engine must prevent a change from causing a collision without resolution.
For example, if a price change in the source database is propagated to the target database, that change is distinguished from target database changes made by the salesperson. The data replication/sharing engine delivers the salesperson`s changes back to the source database, without replicating the price change that was originated at the source database.
Communication Decisions
When synchronizing databases, it is generally not possible to send the entire database from one site to another to determine the data that has changed, especially if those sites require acceptable availability and performance/response times.
The more efficient solution used by data replication/sharing is to send only the changed data. Sending only the differences can result in a large savings in communication costs.
In those cases where the changes are handled within the subsystem or through a dedicated connection such as ESCON or Fibre Channel to the target servers, performance may improve. It should be noted, however, that most of the latency occurs in the actual database extraction and load procedures.
Communication issues highlight the advantages of using data replication/ sharing in mixed mobile, LAN-, and WAN-based environments. The constant communication required by traditional client/server architectures is eliminated by an effective data replication/sharing implementation.
Instead, communication between central and client databases takes place only at synchronization time, greatly reducing network traffic and associated costs.
All replication models require additional storage, but reduced network traffic, improved performance, and enhanced employee productivity more than compensate for the additional cost.
Responding to Changing Business Drivers
Because data replication/sharing offers a single solution to a variety of data usage scenarios, companies are able to quickly respond to changes in usage without modifying applications. Companies can develop applications for their specific needs or use third-party products without writing code for database replication and synchronization. The data replication/sharing engine handles all of these needs.
The Ultimate Solution
Some organizations have designed and developed custom software for their database replication and synchronization needs. Typically, the development process is lengthy (a minimum of one man-year), which can significantly affect application development and deployment schedules.
Some commercially available applications use the data replication/sharing approach to data replication and synchronization. When looking toward data replication and sharing, examine internal and external alternatives.
A good starting product: nonproprietary (so they work with all your existing hardware and infrastructure) off-the-shelf solutions that are year 2000-compliant, support RDBM systems, and are easy to set up and use.
Now the work begins for you and your information partner.
In a shared-storage approach to data sharing, as in a host-based approach, intelligent middleware handles data conversions.
The host-based approach to data replication/sharing is independent of specific platforms or storage systems, allowing users to build solutions with best-of-breed components.
George Mele is director of storage marketing at Amdahl Corp., in Sunnyvale, CA.