How to improve SAP Raid performance
SAP R3 (SAP) is one of the most popular integrated software systems. However, the suite has proven to be a large, complicated application requiring considerable configuration and programming expertise. System administration of a SAP installation is similarly complex, requiring knowledge of the underlying database, experience in managing large servers with multiple gigabytes or terabytes of disk storage, and new methods for protecting critical data.
SAP environments consist of several key components, including applications, host computer and operating system, databases, and I/O subsystems. This article examines database issues and I/O subsystem configurations using RAID to optimize performance and data protection.
At the heart of a SAP system is an underlying database--either Oracle or Informix. The principles are similar, but this article uses Oracle for all examples. Oracle 7.x requires the use of 4GB file systems or raw devices. Thus, for a 1TB subsystem, there would be 250 4GB partitions. Many SAP installations are between 100GB to 600GB, requiring 25 to 150 partitions. Arrangement of these partitions is one of the most important configuration steps for a successful SAP implementation.
SAP employs a naming convention for identifying data tables and index tables. For instance, the tablespaces PSAPSTABD and PSAPSTABI indicate that they are a data tablespace and an index tablespace, respectively (the "D" and the "I" identify the function).
Data tablespaces and index tablespaces should be separated as much as possible to reduce I/O contention at the disk and I/O channel level. At a minimum, they should be located on separate logical unit numbers (LUNs) or disks. The optimum I/O model places these two tablespaces on separate RAID array modules and I/O channels.
Which RAID level?
RAID arrays can be configured in various ways to affect application performance. Some applications are more storage-intensive than others, requiring more disk space than those that are I/O bound. Other applications are I/O-intensive and may not need to access large amounts of storage. Other applications represent a mixture of I/O and storage with other attributes, including reads vs. writes, sequential vs. random, and small vs. large I/O.
RAID levels and specific vendor implementations are optimized for different applications. RAID0 (striping) can yield the best performance because I/Os and data are spread across several drives. However, RAID0 provides minimal data protection and, generally, should not be used for data that changes often--without using some additional level of protection like RAID5 (striping with parity) or combining RAID0 and RAID1.
RAID1 (mirroring) protects data by duplicating it on one or more additional drives so that an exact copy exists on multiple drives. RAID1 can provide performance gains on reads by allowing a data request to be handled by the most available drive. However, RAID1 is not optimized for write-intensive applications and can be expensive due to increased drive, adapter, and management costs. Combining RAID0 and RAID1 (often referred to as RAID0+1 or RAID10) yields the highest performance and good availability, albeit at the highest cost in terms of hardware.
RAID5 (striped parity) performs well in read-intensive applications with multiple concurrent users in either sequential or random environments. However, early RAID5 implementations were slow in write-intensive applications because of a performance penalty caused by the RAID5 parity protection scheme, which requires multiple disk I/Os to safeguard the data.
More recent RAID5 arrays with write gathering and mirrored write-back cache are changing traditional thinking that RAID5 is too slow for database applications. Performance can be significantly enhanced by placing data and indices on RAID5, while storing transaction, journal, and other I/O-intensive objects or files on RAID0+1 or solid-state disks.
The examples described in the following sections show several different configurations, including a small database with a single RAID array, a medium-size 500GB database, and a large 1TB system using four RAID arrays. In these examples, each RAID array consists of 16 3.5-inch 7,200rpm 18GB disk drives with redundant RAID controllers and mirrored write-back cache.
The RAID arrays in the examples are configured for maximum storage and performance by using RAID5 with write-gathering cache and two LUNs per array. By configuring two hardware or SCSI LUNs per array, each LUN can be assigned and accessed via a dedicated controller on the array.
Write-gathering technology, available from several RAID vendors, enables the RAID controller to group multiple write operations to eliminate the RAID5 parity update penalty while maximizing the use of cache. The RAID5 write penalty happens when an update or write occurs that causes multiple I/Os to be performed to all the disks in a stripe.
To perform an update in RAID5, if the old data and/or old parity are not in cache, they are read from disk. The controller then calculates the new parity that corresponds to the new data, then writes the new data and parity to disk. This is referred to as the read-modify-write penalty and is illustrated in Figure 1.
Thus, it is possible for a single disk write to result in up to four actual disk I/Os. This gets worse as more writes are issued from the host. New data is placed in cache until it can be written to disk. With higher levels of writes, more cache is needed to store the pending writes. RAID arrays with fast back-end disk I/O subsystems using Fast Wide SCSI (20MBps) or Ultra SCSI (40MBps) drives on multiple buses combined with write-gathering can overcome these problems. The write-gathering process is illustrated in Figure 2.
Storage subsystem configuration
This section examines some details involved in creating the logical volume structure and reviews the physical and logical layers that make up a tablespace in Oracle and SAP.
Figure 3 shows a single RAID array configured with three RAID5 LUNs, each with eight 18GB Ultra SCSI drives. The host computer and its operating system would see two SCSI devices, each with 126GB usable capacity. In this figure, the "c0" and "c1" next to the host computer indicate which host controller or I/O adapter is attached to the RAID array. In this example, dual I/O adapter cards are being used for better performance and redundancy. The "t0" and "t1" next to the RAID array indicate the SCSI target address of the two RAID controllers in the array. The disk drive symbols to the left and right of the RAID array indicate the two LUNs.
RAID array configuration
For optimum performance, two LUNs are created on each array and assigned to one of the RAID controllers on each array. Each LUN is configured with a stripe depth of 32KB for optimum read-ahead and write-caching operations. The stripe depth refers to how much data will be written to each drive in the stripe, involving as many drives as possible in each host I/O operation. As an example, if 128KB of data were being written from a host system, 32KB would be written to the first disk, then 32KB to the second disk, 32KB to the third, and 32KB to the fourth. With write gathering and caching combined with multiple back-end SCSI buses, a RAID array can perform these I/Os in parallel, further improving performance.
Write-gathering technology used with RAID5 eliminates the need for updating the parity with each disk I/O, thus overcoming the RAID5 read-modify-write penalty. The parity is calculated when all data in a stripe is present in cache or after all four disk I/Os have occurred. This protects the data while maintaining high performance. A stripe depth of 32KB helps on subsequent reads by increasing the read-ahead performance so that when disk I/Os are required, data is pre-fetched in anticipation of the next I/O operation to boost performance. RAID management software is useful when configuring RAID levels and stripe depth, as well as for performance monitoring.
For Oracle 7.x, raw disk devices or UNIX files can only be up to 4GB in size. Thus, a 126GB LUN would have to be subdivided into logical volumes of an appropriate size. LUN configuration is performed with the logical volume manager.
In Figure 3, a single RAID array is used for a 200GB system. The RAID array would be configured with two SCSI LUNs (not to be confused with LVM LUNs or partitions). Physical LUN 0 would be assigned to support data tablespaces/dbspaces, and LUN 1 would be assigned all indextablespaces/dbspaces. This would provide the best I/O performance and redundancy using a single RAID array with redundant controllers and redundant host I/O buses.
Figure 4 shows a 500GB SAP system with indices and data tables spread over two RAID arrays with four LUNs to avoid contention.
Figure 5 shows a 1TB solution configured for performance using four Ultra SCSI host I/O adapters for about 140MBps I/O bandwidth. In this example, data and indices are spread across the four arrays and eight LUNs to avoid I/O contention.
RAID arrays with hardware disk stripping appear to the host as fewer, larger, and faster LUNs or storage devices, resulting in simplified storage management. Non-RAID, or JBOD, configurations require more management and, possibly, hardware to create mirror and stripe sets, load balance, and perform tuning of the storage subsystem.
Some applications are more storage-intensive; others are more I/O-intensive.
Fig. 3: Example shows a single RAID array configured with three RAID5 LUNs, each with eight 18GB UltraSCSI drives. Fig. 4: Example shows a 500GB SAP system with indices and data tables spread over two RAID arrays with four LUNs to avoid contention. Fig. 5: A large SAP system (e.g., 1TB or more) can be configured for performance using four Ultra SCSI host adapters, with data and indices spread across the four arrays and eight LUNs to avoid I/O contention.
Fig. 1: In some RAID5 implementations, the read-modify-write penalty negatively impacts write performance. Fig. 2: Write gathering enables the RAID controller to group together multiple write operations to eliminate the RAID 5 parity update penalty.
Greg Schulz is eastern regional director of systems engineering at MTI, in Anaheim, CA.