Performance boosts from solid state disk
For SQL Server I/O performance, streaming throughput is irrelevant.
Enterprise storage is virtually synonymous with database storage. This makes database I/O requirements the defining elements of a corporation`s storage requirements. In this light, the changes in the architecture of SQL Server 7.0 to simplify the lives of end users and database administrators are destined to have a significant impact at many sites.
The importance of SQL Server 7.0 is that it attempts to commoditize specialized database functions into a general-purpose database. At the same time that end-user functionality is increasing exponentially, complexity for the database administrator is decreasing logarithmically. In particular, Microsoft has made SQL Server`s database engine largely self-configuring, self-tuning, and self-managing. The goal, of course, is to dramatically increase the use of SQL Server. To that end, Microsoft will even be integrating SQL Server 7.0 with the Premium Edition of Office 2000 (see BackOffice CTO, February 1999, page 53).
With the long-term direction of making SQL Server a self-healing service, Max Async IO is about the only parameter that will need to be manually tuned. The parameter is set on the master database via the execution of sp_configure `max async io`. This sets the maximum number of simultaneous disk I/O requests that the SQL Server system will be able to submit to Windows NT during a checkpoint operation. The default setting is 32 (32 read and 32 write operations).
During a checkpoint operation, SQL Server writes to disk any buffer cache pages that have been modified since being brought into the buffer cache. To increase efficiency, SQL Server sorts the data pages to be flushed in the order that the data is stored on disk and then asynchronously submits that data in 8KB packets. In this way, SQL Server is able to finish submitting required disk I/O requests without having to wait for the disk subsystem to physically write the data to disk.
This is an efficient scheme for a housekeeping function, just so long as SQL Server--or any application--doesn`t dump far more data on the OS than the disk subsystem can effectively handle. If too much data is dumped at once, the overhead of cleaning up the mess can be prodigious.
In particular, SQL Server must ensure that any time data is being read that data is correct (i.e. the most current data). As a result, SQL Server reads data from disk synchronously. This makes the blocking of an SQL Server read a costly operation. Locks have to be taken out and all the extra associated overhead processing taken care of in order to correctly deal with the stalled thread. In short, an entire Windows NT system could be brought to its knees if SQL Server is the cause of excessive disk queuing.
Just how likely is the occurrence of such a catastrophic event? Unfortunately, the odds are not in your favor. New efficiencies in SQL Server 7.0 increase the likelihood of a blocked read in the face of an excessive number of write requests. To make SQL Server 7.0 more self-configuring, the Read-Ahead Manager is tightly integrated with the Query Processor. Whenever the Query Processor determines that a query would benefit from performing one or more 64KB read-ahead operations--typically when doing a large table or index range scan--it notifies the Read-Ahead Manager to read more than just the required data.
Cache is king
The fundamental challenge confronting any I/O subsystem in a SQL Server scenario reduces to the avoidance of disk queuing. Checkpoint operations can be allowed to be as fast and efficient as they can be only to the extent that they can be dispatched and immediately satisfied. Otherwise, these operations need to be throttled, so that the underlying I/O requests are doled out over a longer period of time. That is precisely the function of the Max Async IO parameter.
The only way to increase the speed at which SQL Server executes checkpoints is to ensure that the disk subsystem is powerful enough to handle a higher Max Asynch IO parameter setting and not queue up disk requests, which bottleneck the system. With this in mind, CTO Labs examined an I/O subsystem that could guarantee meeting the I/O-request demands of SQL Server under any performance load: a solid-state--a.k.a. RAM--disk.
We tested a SCSI-attached Excellerator RAM disk from Solid Data Systems. Solid Data Systems has a wide array of configurations, including rackmount and tower systems. Typically, these smaller configurations are packed with 1GB to 3GB of DRAM. The complete subsystem comes packed with a single-ended or differential SCSI-2 controller from Adaptec, internal battery backup, and an internal SCSI-2 disk drive, which is used as a backing store. When the RAM disk is backed up to the hard disk, the process is done on a block-by-block image basis.
We conducted our tests of the Excellerator subsystem on a Gateway ALR 9200 server. Our test system was configured as a two-way SMP system with Pentium II Xeon processors pegged at 400MHz and 512KB of L2 cache per processor. Main memory was 512MB. As a source of data, we used our BackOffice CTO data mart.
For our tests, we were able to keep our entire data mart, which currently consumes about .5GB, resident on the Excellerator. Most large-scale databases, however, will have storage requirements that are one to two orders of magnitude greater than our modest test case. In such situations, the Excellerator can be used to hold particular hot database elements, such as entire tables or at least nonclustered indexes that are associated with tables that have a lot of query or write activity. Other good alternatives are Tempdb and transaction log files.
In particular, Tempdb is a scratch area created by SQL Server and rebuilt every time SQL Server is started. SQL Server uses this database as a shared working area for activities that generate temporary worktables, such as sorting, aggregation, and hash joins. Naturally, Tempdb is very update (write) intensive, which makes RAID 5 a bad choice for Tempdb.
Only a RAM disk can absolutely guarantee overcoming all of the underlying mechanical constraints that a hard disk puts on the I/O requirements of SQL Server. At today`s rotation speeds of 7,200rpm to 10,000rpm for Ultra ATA and Ultra SCSI disks, a 64KB data transfer takes only a couple of milliseconds. The problem is getting there. Moving the actuator arm to the correct location on disk can take 10 to 20 times as long. As a result, a typical SCSI or ATA disk can handle only about 100 random 8KB reads per second. This pegs peak throughput on 8KB reads at only about 800KB per second--a far cry from the 40MB per second theoretical throughput.
The only device comparable to a RAM disk is a RAID-10 subsystem with a large cache and a very good--make that exceptional--set of caching algorithms.
These constraints put a significant burden of the overall performance on the I/O subsystem. With this in mind, we used a heavily cached FlashDisk subsystem from Winchester Systems and a RAIDzone subsystem with no cache from Consensys as our reference standards.
When we ran our standard disk I/O benchmarks, DiskPerf and Load, we were immediately struck by the incredible similarity in performance of Excellerator and FlashDisk. Our FlashDisk unit was configured with a 100MHz Intel 486 DX4 CPU and 64MB of EDO DRAM for caching, along with 12 Cheetah SCSI drives from Seagate, which have a capacity of 9GB and spin at 10,000rpm. For maximum write performance, we configured our FlashDisk for RAID 10. which yields an effective storage capacity of 54GB.
For both the Excellerator and the FlashDisk, there was virtually no difference in performance between reading and writing data. What`s more, the difference in performance between Excellerator and FlashDisk was typically within 15% to 20%.
RAIDzone, the Consensys Ultra ATA RAID subsystem, provided a distinctly different profile. Throughput on sequential reads was far greater than either the Excellerator or FlashDisk. However, throughput on writes was far lower than the other systems.
On random 8KB reads, the behaviors of Excellerator and FlashDisk were again remarkably similar. Both were able to sustain more than 5,000 I/Os per second with only a couple of I/O processes accessing the disk. In particular, it took 20 simultaneous processes accessing the RAIDzone subsystem to equal what one process could do on the Excellerator.
We then proceeded to move our database onto Excellerator. Simply moving the database from a single hard disk to the Excellerator reduced the time required to build the MOLAP cube associated with our database by 25%. We then set the Max Async IO parameter from the default value of 32 to its maximum value of 255 and once again reduced processing time. This time it was cut by 10%.
The most conspicuous outward manifestation of the new settings was a distinct drop in the overall time that disk activity was taking place. Monitoring the process with Perfmon at one-second intervals, the maximum number of I/Os taking place per second increased by a factor of four and peak I/O throughput rates nearly doubled.
Plotting the results of the CTO Labs load benchmark in a semi-log format highlights the performance similarities of the FlashDisk and Excellerator subsystems. Both systems are distinguished by an ability to satisfy large numbers of I/O requests with only a handful of processes.
I/O operations were fairly continuous while processing our MOLAP cube in the baseline case of hosting the database on a single disk. With the database resident on the Excellerator disk and the Max Async IO parameter set to 255, time to process the cube was cut substantially. Periods of I/O were also reduced dramatically, as four times the number of I/O operations were being processed at a time.
For SQL Server 7.0, I/O performance is characterized by complex queries generating random 8 KB reads and internal operations, which involve large (64KB) sequential reads and writes. This makes the ability to process a maximum number of operations in a minimum amount of time the key to peak I/O performance. Unlike a streaming-video application, for which performance is governed by how fast it is possible to spiral down a disk in one sequential sweep, database performance is a function of how many times a disk can be accessed.
Solid Data Systems
Price as tested: $30,000, includes 1GB of RAM, a SCSI-2 controller, a SCSI-2 disk and internal battery backup.
For large enterprise databases, Excellerator can be used to hold particular hot database elements, such as entire tables or at least nonclustered indexes associated with tables that have a lot of query or write activity. Other good alternatives are Tempdb and transaction log files. Excellerator`s exceptional ability to deliver 2,000 I/Os per second from a single process enables a DBA to set the Max Async IO parameter to its highest value.