Data Sharing Provides Universal Data Access
Five vendors. Five solutions. One goal: to provide access from heterogeneous platforms to a common pool of data.
By Charles T. Clark
Dramatic improvements in some areas of computing have created an acute need for new technology solutions in other areas. One of the major examples of this trend is data sharing, or information sharing, which has resulted from the need to move large amounts of data among disparate environments.
For example, the explosive growth of data warehouses has created a need for a rapid and efficient means of transferring information from operational databases housed on mainframes to data warehouses residing on Unix servers. Other applications that benefit from data sharing include centralized backup and restore, data migration from different environments, and the retrieval of data from a mainframe environment by Web servers.
Five companies--Amdahl, EMC, Hitachi Data Systems (HDS), IBM, and Sun Microsystems, together with their partners--offer data-sharing solutions, but their implementations vary widely.
According to independent storage analysts, the two front-runners in the data sharing arena are EMC and Sun, with Amdahl, HDS, and IBM hot on their trail. Sun`s data-sharing expertise came from its acquisition of Encore Computer, the developer of DataShare technology. EMC`s data-sharing technology (which the company calls "information sharing") is in large part possible because of its partnership with database software specialist BMC Software. EMC and BMC co-developed the DataReach product.
By purchasing Encore, Sun went from being a car dealer to a car maker (to use one of CEO Scott McNealy`s favorite analogies) in the world of storage. For years, Sun had been a storage reseller, but it has recently moved aggressively into the circle of storage manufacturers.
According to Kathleen Holmgren, vice president of Sun`s storage division, one of the competitive advantages that Sun acquired from Encore was its DataShare technology, which Sun calls a "true" data-sharing solution. Says Holmgren: "With DataShare running on the A7000 storage platform, you have a volume of data and you actually can read it and use it rather than copying it. We look at competitive versions of data sharing more as data transfer, or copy transfer, facilities versus true data sharing."
Sun`s DataShare allows a Unix server to directly read QSAM data (not VSAM) from a mainframe, according to Amy Lynch, product manager for DataShare. "The mainframe writes the data to the A7000," says Lynch, "And it is directly accessible by the open-systems server, meaning that we don`t have to make another copy of it." The mainframe data is presented to the Unix host as a standard CD-ROM (ISO 9660) file system so that the data can be presented to multiple, disparate Unix platforms--HP, IBM, Sun, etc.
John Young, vice president of enterprise planning at The Clipper Group, a consulting firm in Wellesley, MA, contends that Sun has an edge (for now) over EMC. "Sun is prepared to take the lead over EMC. EMC`s DataReach is a good product. It tries to achieve the same effect, and it does produce some of the same savings in that it takes the load off the network, provides faster communication between heterogeneous systems, enables you to store different kinds of data types in the same box, and so forth. But the fundamental difference is that Sun`s A7000 does all of this with a single copy of the data that doesn`t have to be replicated or mirrored."
David Vellante, senior vice president, systems software and storage, at International Data Corp., a market research firm in Framingham, MA, disagrees. Vellante contends that EMC offers a more elegant solution than Sun. "EMC`s approach [to data sharing] involves a lot more of what I would call `the real deal.` It`s much more complex, and more limited in scope."
Doug Fiero, EMC`s director of product marketing, disputes Sun`s claim to a performance advantage. Fiero says that, although EMC`s DataReach software does require replication, it transports data via ESCON channels and not over the network, thereby saving both network bandwidth and processor cycles.
With EMC`s DataReach, the usual scenario is to have an MVS DB2 operational system with the DB2 data stored on a Symmetrix disk array. The mainframe is connected to the mainframe data over an ESCON channel. The open-systems server connects to the Symmetrix array via a SCSI channel. DataReach at this point can read the mainframe DB2 image. The DataReach software can do a physical read of the CKD device, and it knows how to read the DB2 metadata. Next, DataReach uses the DB2 metadata to extract rows and columns, and then, using a native database loader, loads the rows and columns into, say, an Oracle or Sybase RDBMS.
Some analysts contend that without the mirroring technology provided by EMC`s Time Finder software, the DB2 database has to be down; otherwise, users must be content with a "fuzzy copy" (a possibly inaccurate copy at the time of the extract) of the data. The need to take the database down is eliminated by using Time Finder. But now the entire database needs to be mirrored when the extract takes place. The use of mirroring software, critics argue, requires a resynchronization that could be time consuming if the entire database has to be copied.
Other analysts contend that DataReach comes closer to what users require than other solutions, including Sun`s DataShare. And, it appears that the debate will continue (see sidebar "DataShare vs. DataReach").
Despite the plaudits that DataReach has received from some analysts, EMC and BMC are not resting on their laurels. The two companies are working on the next phase of their "data propagation" strategy, according to Russ Donovan, DataReach product manager at BMC. The first product will be a change capture facility to complement DataReach. Says Donovan: "There`s tremendous interest among our customers to be able to capture only what changed in the last, say, 24 hours, or even in the last 10 minutes, and move that across, as opposed to moving entire tables." The two companies were expected to release the change capture facility this month.
IBM Paints a Seascape
IBM has caused a lot of skepticism among analysts because of its procrastination in delivering products based on the Seascape architecture. However, IBM recently began delivering products that it contends implement the architecture. Bill Pinkerton, IBM`s director of marketing, admits to some delays, but says they resulted from skipping a generation of products.
For example, IBM recently introduced InfoSpeed, which implements the Seascape architecture. The InfoSpeed data-sharing solution allows rapid movement of large data files between an S/390 mainframe environment and open-systems servers, providing universal access to the data. IBM contends that because InfoSpeed uses ESCON channels and SCSI bus connections to transport data, users can move data from a mainframe to a client/server environment cost effectively. Unlike some data-sharing solutions, InfoSpeed does not affect existing networks. The heart of the solution is the InfoSpeed Data Gateway, a multiplexor that switches SCSI port connections, which improves connectivity to Unix and Windows NT systems.
Pinkerton contends that InfoSpeed has advantages over the competition. One advantage, he claims, is that it doesn`t require a proprietary storage system, as do EMC`s and Sun`s approaches. Instead, data is pumped from server to server, so any vendor`s storage subsystems can be used. Additionally, Pinkerton says InfoSpeed allows users to do program-to-program file transfers, eliminating the need to send an entire file from the mainframe to the open-systems server.
HDS has yet another variation on the data-sharing theme. Hitachi`s Multi-Platform Data Sharing Facility is essentially software that is designed to move data among disparate platforms, according to Ray Cosyn, product marketing manager. The software comes in two flavors: one that runs on the HDS 7700 mainframe storage platform, and one that runs on the HDS 5500 open systems storage platform.
The Data Sharing Facility permits transactions on, say, a mainframe OLTP system to be written across a data channel (ESCON on the mainframe side and SCSI on the open-systems side) to an opensystems server, which could be supporting a data warehouse.
Hitachi is on the verge of introducing major enhancements to its data sharing offering. The company is working with an independent software vendor that is writing application code and building intelligent agents that will reside on the mainframe and open-systems platforms. This software, according to Cosyn, will allow users "to do backup and recovery from either the client side or the agent side."
Amdahl: Nothing But Net
Amdahl offers a suite of data movement products, called Global Information Sharing (GIS), that allows users to share data across different computing environments and databases within the enterprise. GIS is based on technology licensed from Praxis International and, unlike the other four solutions, uses a standard network to move data from mainframe to open-systems environments.
The GIS framework can be used on any storage hardware platform and consists of five components: InfoDirector, Enterprise Information Sharing Model, InfoLoader, InfoReplicator, and InfoCopy.
InfoDirector is a Windows-based user interface that serves as the command center for GIS and defines the Enterprise Information Sharing Model (EISM), a repository containing metadata for defining the data-sharing environment. The EISM metadata answers such questions as: What data needs to be moved? Where does the data need to go? What transformations need to take place?
InfoReplicator allows asynchronous replication of data between heterogeneous DBMSs and delivers "near real-time" updates to replicated databases. The InfoCopy component of the GIS suite distributes snapshot copies of a single table, groups of tables, or all tables on a designated server to selected targets on the network. The InfoLoader component simplifies the copying of large tables by permitting cross-DBMS extract and load processes. Together, the GIS components provide an end-to-end data movement solution.
The Bottom Line
Data sharing, or information sharing, is still in the early stages of development. Clearly, it is an important technology given the current focus on data-warehousing applications. But vendors have a lot of work to do.
Sun may have the edge in certain situations; EMC in others. Both vendors are expected to enhance their products. Sun will tap its expertise in Java, and EMC will benefit from its alliance with BMC.
However, do not count Amdahl, HDS and IBM out of this race. All three companies have improvements in the works that will allow them to close the gap with Sun and EMC.
But, don`t expect data sharing to become ubiquitous anytime soon. It`s simply too expensive for most users. Data sharing will continue to be used in select applications, such as moving data from operational databases to data warehouses, where the benefits more than justify the expense.
Data Sharing: User Case Studies
Marketing Communications Services (MCS) in Ivyland, PA, has built custom marketing databases for more than 12 years. When the company was searching for technology that would allow them to move data from a mainframe to a Unix server, the IT staff initially looked at three solutions. But all three were much too slow, says Leon Roomberg, director of database development at MCS.
But then MCS evaluated Encore`s DataShare technology. After extensive testing, the company purchased an Encore SP40 system running the DataShare software. (Encore was subsequently acquired by Sun Microsystems.)
MCS has used DataShare to solve some difficult data mining problems over the past two years, says Roomberg. "We were doing a lot of merge/purge, sorting, duplication, and file hygiene on the mainframe and needed higher DASD performance under Oracle 5. The multiple pathways--NetWare for SAA, Microsoft`s SNA Server, and an Eicon gateway--between the mainframe and the client/server environment were much too slow."
Roomberg says that the major benefits that his company has derived from the DataShare facility running on Sun`s A7000 storage array are sustained high speed, no impact on mainframe MIPS, and the fact that no translation is required. Of these, he singles out the sustained high speed as the most important advantage. MCS has experienced a sustained transfer speed of 4GB to 4.5GB per hour from the mainframe to an Oracle database. While other vendors quote higher speeds, Roomberg contends that those rates are burst rates, not sustained rates. He adds that he can increase the transfer rate even further by adding more SCSI, ESCON, or bus-and-tag channels.
MCS` storage configuration consists of one terabyte on an EMC Symmetrix 5500, a half terabyte of RAID on Falcon and Symbios arrays, and a Sun A7000 storage array connected by two sets of bus-and-tag channels on the mainframe side and one SCSI connection to the Unix server. Computing resources consist of an IBM mainframe, Sun and Hewlett-Packard Unix servers, and DB2 and Oracle databases.
For other sites, EMC`s DataReach software solves the problem of heterogeneous platform access to common storage. Example: Apertus Carleton, in Eden Prairie, MN. David Haggerty, vice president of professional services at Apertus Carleton, was one of the first users of EMC`s DataReach data sharing software, which was co-developed with BMC Software. Apertus Carleton provides data integration solutions, specializing in data warehousing.
Haggerty has worked extensively with DataReach, and is pleased with the product. "It enables you to do relational-like queries from the source that you`re pulling data from. This capability is very convenient because many times you want to filter the data; you don`t want to take all of it."
DataShare vs. DataReach
Sun`s DataShare and EMC`s DataReach software represent two different approaches to the same problem: accessing data from heterogeneous environments.
Sun`s DataShare software emulates 3380/3390 DASD on Sun`s A7000 shared storage system, allowing a mainframe to write a sequential data set directly onto the shared disk. A Unix system can then read these data sets without copying the data for transfer to the Unix environment.
Conversely, open-systems files can be stored on the shared disk and read by mainframes. The mainframe sees the A7000 as a channel-attached disk subsystem. The advantage of the DataShare approach is that files don`t have to be copied, resulting in potential speed advantages. However, Sun`s DataShare currently delivers only flat files to Unix environments.
EMC`s DataReach moves data from a DB2 database on an MVS mainframe to a Unix server by copying the data to a Symmetrix storage array that is shared by both systems. The data is then extracted, translated, and moved to an RDBMS database in the Unix environment. DataReach moves the data through ESCON and SCSI channels, without consuming network bandwidth and mainframe processor cycles. In addition to flat file output, DataReach can move data directly into a database loader such as the Oracle SQL loader.
Which solution is best? It depends on the application, according to David Floyer, research director, enterprise systems, at International Data Corp. (IDC), a market research firm in Framingham, MA. "The ideal would be the best of both [DataShare and DataReach], which doesn`t exist. But they` re different solutions for different problems. For example, if a user has a classic end-of-the-day problem, say, stop an application and then have the batch start immediately, then DataShare solves the problem: You can go straight into the batch without any delay. On the other hand, if a user has to make a copy of the data, which is often the case, DataReach would be a superior solution."
"From a functional point of view," adds Floyer, "Sun`s solution is certainly more advanced. From a practical point of view, EMC is doing things that customers want to have done, the most important of which is to make an extra copy of the data."
Without data sharing, a traditional data warehouse load in an OLTP application requires separate, dedicated storage subsystems. With data sharing--in this case, Sun`s DataShare software running on the A7000 array--the separate storage subsystems can be consolidated in one array.
EMC`s DataReach software, running on a Symmetrix array, loads and extracts data without using mainframe CPU cycles or network resources.
Charles T. Clark is a freelance writer in Haverhill, MA, and a regular contributor to InfoStor.