Data Warehouses to Benefit from SANs
Potential advantages include scalability, performance, availability, and flexibility.
By Mark Kincaid and Joyce Albert
Competition in business was dynamic in the last decade, is extremely intense today, and promises to become downright fierce as we enter the next millennium. This competitiveness has, in part, been fueled by deregulation in the telecommunications and transportation industries and by market globalization.
Meanwhile, users are demanding higher levels of quality and service. Since it is often more expensive for companies to sign on new customers than it is for them to retain existing ones, their commitment to customer satisfaction has become stronger than ever before. "Mass customization"--i.e., the tailoring of products and services to the needs of specific market segments--has become commonplace.
To remain competitive, companies are demanding detailed market information. The data warehouse has emerged as a mechanism through which market data can be extrapolated. Data warehouses consolidate current and historical transactional data--no matter how disparate--from a company`s operational systems into a consistent database that can provide useful business information. Data warehouses can assist companies with customer segmentation, "micro-marketing," and trend analysis and help them identify potential problems, forecast business opportunities, and increase profitability.
Data warehousing is a multifaceted concept. A central data warehouse consists of a single physical database that contains data from multiple operational systems. A data mart is a specialized small-scale data warehouse, usually implemented at the departmental level.
There are two types of data marts: independent and dependent. Independent data marts extract data directly from operational systems; dependent data marts are subsets of larger data warehouses. And then, there are data malls, which are integrated systems of multiple data marts.
Data warehouses are used in a variety of industries, including retail, financial services, healthcare, telecommunications, manufacturing, transportation, and government. Within an organization, multiple departments and many different people use data warehouses for specific applications.
Data warehouses complement but do not replace on-line transaction processing (OLTP) databases. The two differ in a number of important ways. OLTP databases are operational systems for day-to-day activities, while data warehouses are informational repository systems used for planning, forecasting, and managing operations. Data warehouses contain fixed data, typically covering a 5- to 10-year period, and provide historical records. Data in warehouses cannot be updated or changed unless it is entered incorrectly. OLTP databases, on the other hand, contain operational data that is constantly updated for up-to-the-minute accuracy. OLTP databases often contain only 60 to 90 days of information.
Rapidly Growing Market
In 1995, only 3% of data warehouses were over 500GB in size, and only seven companies (including such mega-corporations as Sears, MasterCard, MCI, Tandy, and Wal-Mart) had 1TB warehouses. Today, the average warehouse contains more than a 100GB of data and doubles in size every 18 months.
Meanwhile, requests to set up new data warehouses and to expand access to existing warehouses continue to increase. Today, 90% of the world`s 2000 largest companies (the Global 2000) reportedly implement data warehouses, while the market itself is growing at an estimated rate of 65% per year. And by 2000, related software expenditures are expected to reach $2 billion.
In response to this growth, vendors are emphasizing the importance of building scalable warehouses, which will adapt to a company`s changing needs without requiring them to reload data. Configuration tends to be modular to allow the seamless addition of storage capacity, processing power, and new technologies.
As storage costs drop, the number of companies implementing data warehouses increases. Multi-host configurations help lower overall costs since multiple servers can be connected to a single storage pool; costs can also be minimized by redeploying existing storage as server technology is updated. Flexibility and lower buy-in and maintenance costs are particularly important when you consider that storage represents the largest single investment in large data warehouses.
Managing Storage Growth
The explosive growth in storage has created new data-management challenges for companies. An emerging technology--storage-area networks (SANs)--should help lead the way to the future.
SANs are based on the networking architecture of the local-area network (LAN). The architecture is often implemented using Fibre Channel (Fibre Channel-Arbitrated Loop or a fabric configuration) or SCSI (Ultra or Low-Voltage Differential), linking clients and servers to a common backbone for access to storage.
SANs provide a number of potential benefits, including:
- Scalability. Storage components can be added as needed to support future capacity requirements. As a result, data warehouses can grow without significantly disrupting processing.
- Performance. FC-AL supports 100MBps per loop; UltraSCSI, 40MBps per channel; and LVD SCSI, 80MBps per channel. These bandwidth levels help resolve performance bottlenecks in traditional server-based implementations
- High availability. RAID controllers attached to a SAN support various RAID levels (with hot spares) and FC-AL (in a dual-loop configuration), offering redundancy at the channel level. In addition, Fibre Channel hubs and switches support optical fiber cables for long-distance and remote storage.
- Flexibility. FC-AL and SCSI can be linked to a fiber backbone that supports fabric switching for dynamic allocation of storage. Fibre Channel hubs support point-to-point connectivity to storage. Fibre Channel/SCSI bridges link SCSI technology to the fiber backbone. Additionally, SCSI switches support similar connectivity to Fibre Channel in a pure SCSI environment. Any server attached to a SAN supports tape libraries on Fibre Channel and SCSI backbones, providing multi-server accessibility. In the future, SANs will support Internet Protocol (IP) traffic, enabling server-to-server communications and direct connections to storage.
Data Warehouses and Clustering
Perhaps the easiest way to understand SANs is to look at clustering. Clustering technology provides redundancy at the server level and supports distributed data access among servers using software or hardware lock managers. (Lock management is the ability for two or more servers to access the same data resource in a SAN while maintaining data integrity.) Clustering also supports continuous access to applications and data, should a server fail. Two or more servers share the workload of a failed server by distributing the workload across the cluster or a particular server is specified to take on the workload of the failed server.
Consider a configuration in which two servers are linked through heartbeat (conventional LAN) connections (see figure). Both servers are in constant contact with each other, communicating the status and condition of the environments they manage. If one server fails, the loss of the heartbeat triggers the other server. Because both servers are connected to the SAN, the storage and data applications continue to process, with only minor interruptions to applications.
SANs, clustering, and lock management offer an array of advantages that benefit data warehousing. The greatest obstacle facing IT professionals is the heterogeneous server environment. Although many of the tools and utilities can be shared, IT departments encounter difficulty in migrating data to the warehouse from the different servers and databases. Future improvements in performance, scalability, and flexibility will be based on the gradual globalization of the file systems that make up data warehouses, which will enable true server clustering and lock management.
In a cluster configuration, two or more servers can be connected to a storage-area network.
Mark T. Kincaid is vice president of strategic programs and Joyce Albert is a market research manager, both at ANDATACO in San Diego.