Deploying cost-effective database storage

Tips on configuring and optimizing Oracle and SQL Server databases for maximum performance in OLTP and decision support systems.


End users are looking for cost-effective database storage solutions that maximize online transactions, provide good backup-and-recovery performance, minimize overall storage costs, and provide reasonable fault tolerance. In addition, users need high-performance storage to support Oracle and SQL Server databases, which typically have demanding I/O requirements.

The scalability of databases is often a direct function of the scalability of the underlying storage. Oracle and SQL Server storage must provide I/O load balancing across all database files, minimize hot spots resulting from uneven data distribution, and eliminate I/O contention from uneven data access.

The ability to configure and optimize database storage is a complex process. Database storage requires special consideration based on workload, capacity, and data movement. Deploying storage for Oracle and SQL Server requires an understanding of best practices with data placement and data transport issues. Data transport requirements involve the storage fabrics and protocols being used. The use of Fibre Channel and Ethernet is essential for data transport in database storage infrastructures.

Figure 1: Fibre Channel can provide a back-end, SCSI-packet network for centralized Oracle and SQL Server resources. IP and SCSI-FCP networks are key to allocating storage in the data center. iSCSI and NAS provide low-cost block-level and file-level architectures.
Click here to enlarge image

As illustrated in Figure 1, Fibre Channel Protocol (FCP) can provide a back-end SCSI-packet network for centralized Oracle and SQL Server resources. IP and SCSI-FCP networks are key to allocating storage in the data center. Data transport also involves connecting the data-center storage area network (SAN) to other departments in the organization.

Cost-effective transport options include incorporating iSCSI devices in order to use the existing IP network. Workstations deployed in areas such as manufacturing, finance, human resources, and marketing often don't warrant the cost of a SAN but require block-level and file-level access to storage resources. iSCSI and network-attached storage (NAS) provide a low-cost solution with reasonable performance for these users. IP-based storage will play an important role with Oracle and SQL Server storage infrastructures (see InfoStor, June 2001, p. 58).

Database fung shui

In most cases, Oracle and SQL Server are constrained first by the storage system, then by memory, and finally, by the CPU. Data placement issues are a principal concern for database storage solutions.

Click here to enlarge image

Data placement relates to the type of database application being supported. Oracle and SQL Server applications are either online transaction processing (OLTP) or decision support systems (DSS). OLTP uses databases that consist primarily of retrievals and updates based on random queries from a large number of online users. Examples of OLTP systems include reservation systems, order-entry applications, accounting, and manufacturing applications.

DSS systems, on the other hand, consist mostly of ad hoc queries that have no significant updating from relatively few online users running long, complex queries. DSS systems support online analytical processing (OLAP) applications and import data from data warehouses and data marts.

The table outlines the components of Oracle and SQL Server along with their specific I/O characteristics. In the case of OLTP systems, databases generally see the heaviest I/O performance requirement with redo or transaction logs. This is from the write-ahead logging that ensures integrity from media and non-media related failures.

DSS systems have two areas of heavy use. The first is with temp space (tempdb for SQL Server), since it is the "work space" for all aggregate functions such as sorts, order-bys, group-bys, and any other temporary data storage that is needed, all of which is read/write-intensive. The second areas of concern are large tables that are scanned or that have significant amounts of read/write activity.

An important consideration for database storage is having the database block size a multiple of the operating system block size. Unix supports large I/O sizes by setting the maxphys variable in /etc/system on Solaris systems. For SQL Server, I/O block size is based on disk sector size, which is configurable in Windows 2000.

Depending on the exact I/O requirements, it is generally recommended to increase the default block size (usually to no more than 1MB). The stripe width should also be a multiple of the operating system block size. If the database uses raw devices, only the operating system block size needs to be configured. If the database uses file systems, then the file system block size should be considered. Using a 1MB operating system block size, an eight-drive RAID array would have an 8MB stripe width and a 1MB stripe depth.

Databases run on raw devices and file systems. Raw devices add more management complexity without much perfor mance benefit and are only necessary for Oracle Parallel Server. For most database storage environments, advanced file systems (jfs, vxfs, qfs, xfs) should be implemented to provide support for kernel asynchronous I/O and direct I/O. Asynchronous I/O support allows the database to manage file-locking for concurrent access and not let the operating system try to do it. Direct I/O allows data to be written from the database buffers directly to disk without being cached in the file system. Direct I/O lowers CPU utilization and frees up memory, letting database I/O bypass the file system buffer cache.

Figure 2: Data placement and database storage layouts involve a combination of RAID levels. Shown: A 20-drive RAID array partitioned for Oracle.
Click here to enlarge image

There are many options to consider in laying out database files. Depending on whether the application is OLTP or DSS, options include isolating redo logs, data tables, indexes, temp space, and archive devices from each other, while configuring each data volume with a specific RAID level, stripe depth, stripe width, and appropriate I/O block size. Since DSS systems involve parallel index and table range scans, physically storing tables and indexes on the same set of disk will affect I/O performance. With very active OLTP systems, it is recommended to alternate the log files between two or more disks so writing log information to one file does not interfere with reading the completed log file that is being archived. Since redo logs are written every time a transaction is committed, for OLTP systems it is recommended to physically isolate logs from all other database files.

RAID helps minimize database hot spots. (For a review of RAID technology, see "Cure for the I/O blues: A RAID review," p. 46.) Writes to redo log files are sequential, so using RAID 1 minimizes seek times because it leaves the write-ahead at the location of the last write. Sort and temporary tables are transitory, so a viable option is RAID 0 for temp and tempdb. Database storage layouts should also consider dump devices for user data extracts, backup files, and online snapshots. RAID 5 is a good choice for these files. Data placement issues are illustrated in Figure 2 with a 20-drive RAID array configured for an OLTP-oriented Oracle application.

Other methods can be used to build Oracle and SQL Server storage systems. Oracle has published a new approach known as the "Stripe And Mirror Everything" (SAME) methodology. The goal behind this methodology is to minimize the complexity and skill level required to lay out database storage environments.

The four rules of SAME include striping all files across all disks using 1MB stripe depth; mirroring data for high availability; placing frequently accessed data on the outer edge of drives; and subsetting data by partition, not by disk.

Striping database files over many disks is simple, but effective, to minimize I/O contention. Large stripe widths deliver good random I/O performance, and using 1MB stripe depths is efficient for sequential I/O activity. In fact, as disk drive technology continues to improve, stripe depths could be made larger since the formatted transfer rate of disks (in MBps) is increasing faster than improvements made to positioning time associated with seek and rotational delay overhead. As a result, storage should be purchased based on database I/O performance requirements rather than on capacity. Extra disk drives should be deployed for I/O bandwidth, not storage capacity.

SAME recommends mirroring for high availability and suggests striping mirrored disks (RAID 1+0) over mirroring striped disks (RAID 0+1) for multi-disk failure protection. SAME recommends placing data on the outer edges of disk drives; transfer rates are increased due to the outer sectors moving faster than the inner sectors. Seek times are reduced using the outer edge since more than half the capacity resides on the outer edges of the disk platter.

The last rule is to subset by partition, not by disk. This simply means it is better to stripe multiple volumes over many disks rather than create separate, isolated volumes striped over few disks.

The SAME methodology is appropriate for many reasons. SAME leverages performance improvements with today's volume managers, RAID technology, and disk storage solutions and minimizes the complexity associated with database storage configurations.

The SAME methodology is ideal for many database storage environments, but caution should be used applying this methodology to large, multi-user database environments.

The simplest approach to deploying SAME is to build multiple disk volumes striped across all available disks. To account for recoverability, volumes are mirrored.

Database files that have high I/O activity, such as redo logs, archive logs, or temp space, can be placed on volumes closer to the outer edges of the disk platters, while files, such as system or rollback segments, can be placed on volumes near the middle. When using SAME, the stripe depth should be larger than the maximum I/O size for frequent I/O operations.

Figure 3: Diagram illustrates data placement using the SAME methodology in a mirrored, nine-drive array configured for SQL Server.
Click here to enlarge image

Figure 3 illustrates how to deploy SAME with a mirrored nine-drive RAID array configured for SQL Server. Oracle and SQL Server running on frame-based arrays (such as the Hitachi Data Systems 9900, EMC Symmetrix, or IBM ESS) can use this methodology. One approach is to build multiple RAID groups (or LUNs) across multiple controllers that are then concatenated together through host-based volume management. The RAID groups can be RAID 5 (using large cache configurations) or RAID 1. This layout is easy to manage and performs well under most database workloads.

Database performance tuning is based on understanding the underlying I/O activity. With volume management tools such as vxbench from Veritas, you can gain valuable insight into database storage bottlenecks. If the storage environment is not able to support the database I/O rate, then files with I/O contention must be isolated. Following basic principles of data transport and data placement is the first step to deploying cost-effective Oracle and SQL Server storage. Whether the database application is OLTP or DSS, the storage infrastructure needs to be configured for the I/O activity.

Mark Teter is director of enterprise storage solutions at Advanced Systems Group (www.virtual.com), an enterprise computing and storage consulting firm in Denver, CO.

Cure for the I/O blues: A RAID review

Host-based and controller-based RAID is designed to address performance and reliability. RAID 0 is a high-performance, low-availability configuration that provides simple disk striping. Striping is based on the linear power of incrementally adding disks to a volume to increase size and I/O bandwidth. Unless an online spare is available, if one disk in a RAID 0 system fails, all data in the array group is unavailable. The amount of data written to each disk before moving to the next block is known as stripe depth or chunk. The stripe depth is to the host-based volume manager what chunk is to the controller-based RAID manager. The group of physical drives being striped across is known as the stripe set, while the number of logical devices, or volumes included in the stripe set, is known as stripe width.

Striping can be either horizontal or vertical. Horizontal striping spreads I/O across controllers, and vertical striping spreads I/O across disks. Striping should be done to make use of multiple controllers and to attain parallel data access. Given two controllers and a four-member stripe set, an operating system and database block size of 8KB would generate a minimum of 16KB down both controllers for each read and write. This would be adequate for a pure OLTP system. For DSS systems that predominantly exhibit sequential I/O behavior, a 1MB stripe size should be used. Most databases exhibit some kind of sequential access, either with batch reporting, index builds, restore operations, or imports/exports, and would generally benefit from larger stripe sizes. Since the actual I/O profile may vary day to day, the selection of stripe depth is somewhat of a compromise.

Click here to enlarge image

A RAID 1 array group consists of a pair of disk drives in a mirrored configuration that duplicates all writes to each disk. Read requests can be satisfied from whichever disk in the pair is able to satisfy the request first improving overall read performance: A single read is not performed any faster, but multiple reads have better performance because they are performed simultaneously. RAID 1 issues write I/O sequentially to disk drives, and then the physical writes are performed in parallel. As a result, host-based RAID 1 is nearly as fast as controller-based RAID 1.

RAID 0+1 configurations improve on RAID 1 by layering mirroring on top of striping. Most mirroring implementations use striping to improve sequential and random read performance through the advantages of striping across multiple controllers and disk spindles. In a RAID 1+0 configuration, two drives are mirrored together, and then the mirrors are striped. In both cases (0+1 or 1+0), the loss of a single drive does not cause failure. How ever, if you lose one more drive from either the primary or mirror, RAID 0+1 will fail. In RAID 1+0, there can be single disk failures from all mirror pairs and the RAID group still remains online. RAID 1+0 greatly improves overall availability as well as performance with sequential reads. RAID 0+1 or 1+0 provides good performance with large scans as well as random reads due to the round-robin effect with the mirrored side of the volume.

RAID 5 stripes data and parity across all disks in the stripe set. RAID 5 performance is good for read-intensive environments due to the effects of striping across multiple spindles. However, write-intensive environments suffer due to the read-modify-write process for parity calculations. This is especially true with host-based RAID; however, frame-based disk arrays and most modular disk arrays provide controller-based RAID 5 that uses cache and special hardware to offset the performance penalty from parity calculations. (See

March 2001, p. 44, for a discussion on frame-based versus modular disk arrays.) Controller-based RAID 5 performance in write-intensive environments can be comparable to host-based RAID 0+1.

With the reliability of disk drives today (more than one million hours MTBF), RAID 0+1 or 1+0 volumes generally do not need hot-spare disk drives. RAID 5 volumes should always have a hot spare available to help alleviate RAID reconstruction bottlenecks. RAID 0, 0+1,1+0, and 5 all have clear advantages and disadvantages (see table).

This article was originally published on September 01, 2001