The goal is to gain the advantages of Fibre Channel with low-cost Serial ATA (SATA) disk drives in nearline enterprise applications as well as mainstream SMB applications.
By Ram Gopalan
With the rapidly increasing storage requirements in enterprise data centers and the regulatory implications of Sarbanes-Oxley, HIPAA, and SEC, large amounts of “business-important” information that is backed up and accessed periodically must still be readily available (auditable) and forms an integral part of information life-cycle management (ILM) strategies. IT managers are under pressure to find storage solutions that deliver more capacity for less money-without sacrificing reliability or data availability. Although this “secondary storage” can have lower performance compared to that of native Fibre Channel “primary storage” systems running production applications, it must still be reliable and provide robust functionality.
To address this requirement, nearline enterprise disk arrays are emerging that retain as much of the Fibre Channel infrastructure as possible while using lower-cost Serial ATA (SATA) disk drives, without sacrificing availability, manageability, data integrity, and functionality. SATA-based storage systems are ideally suited for the workload, capacity, and cost requirements of secondary storage. Their ability to inexpensively store large amounts of information online-such as reference data, fixed content, backup data, etc.-brings tremendous value. SATA D2D subsystems also enable data that would otherwise be archived on tape to be cost effectively brought online-improving access rates, reliability, and service levels.
Many enterprise-class primary storage systems and production applications are best served by Fibre Channel disk drives, even at a higher price. Primary storage systems store business-critical information-data with the highest value and importance. This data requires continuous availability and typically has high-performance requirements. Business-critical data will continue to be stored on Fibre Channel-based disk arrays.
Secondary storage stores business-important information-data that needs to be online, but is only accessed periodically and can often have lower performance and less than 24×7 availability. Secondary storage represents a large percentage of a company’s data and is an ideal fit for SATA technology. Using SATA drives-at about one-third the cost per gigabyte of Fibre Channel drives (although pricing varies widely)-can result in a reduction of up to 50% of the cost of a storage system.
With the emergence of SATA drives with higher rotation rates (10,000rpm+) and MTBF ratings (two million hours or more), as well as enterprise-class features, storage arrays with SATA drives are starting to get reasonably close to Fibre Channel-based systems-without the cost premium. This trend is driving the acceptance of SATA arrays as primary storage devices in the small to medium-sized business (SMB) market segment where cost is a key factor and some compromise on performance is acceptable.
SATA vs. Fibre Channel
Applications are typically classified as having random or sequential data access patterns. Random data access performance is measured in I/Os per second (IOPS) and is essential for transaction-based applications-such as OLTP and databases-with random, small-block I/O. Sequential data access performance is measured in megabytes per second and is crucial for bandwidth-intensive applications (e.g., rich media streaming and high-performance computing) with sequential, large-block I/O.
These two very different applications access patterns place unique demands on storage systems. And while controllers and firmware are critical to overall storage system performance, disk drives play a significant role as well.
SATA and Fibre Channel drives have very different performance profiles. Fibre Channel drives were designed for the highest levels of IOPS and MBps performance-integrating advanced technologies to maximize disk rotation speeds and data-transfer rates while lowering seek times and latency. In addition, the Fibre Channel interface provides robust functionality to concurrently process multiple I/O operations of varying sizes bidirectionally.
SATA’s slower drive mechanisms and limited interface functionality results in both lower IOPS and MBps performance compared to Fibre Channel. And while these limitations may make SATA traditionally unsuitable for many transaction-based (IOPS) applications in enterprise environments, they have much less of a bearing on bandwidth-intensive (MBps) applications.
Outlined below are the different demands transaction-based (IOPS) and bandwidth-intensive (MBps throughput) applications place on disk drives, and how SATA and Fibre Channel drives generally compare.
IOPS-To achieve optimal IOPS performance, disk drives need the ability to get to data faster. Transaction-based applications access data widely distributed across the drive. Therefore, seek latency to locate data on the drive becomes critical. Generally, Fibre Channel drives with significantly lower seek times and latencies can get to data twice as fast as SATA drives.
Command queuing-Transaction-based applications typically generate large numbers of small I/Os. The ability to receive multiple commands at once and optimize the workload to minimize the overall execution time provides a significant performance advantage. Bandwidth-driven applications sequentially read and write I/Os. Fibre Channel disk drives can receive multiple commands at once, and read or write a subsequent I/O immediately after processing its predecessor. Fibre Channel supports extensive command queuing functionality; however, the latest SATA drives also support reasonable levels of native command queuing (NCQ), hence mitigating the performance impact.
Full-duplex communications-As transaction-based applications typically generate large amounts of I/O, the ability of the drive to communicate back and forth with the controller (simultaneously transmitting and receiving data) is critical. Fibre Channel drives support full-duplex communication, while SATA drives operate in half-duplex mode (alternatively transmitting or receiving data). Implementing dual-ported capability on SATA drives helps to alleviate this to some extent.
Data-transfer rate-To achieve optimal MBps performance, bandwidth-driven applications require high-performance data streaming off the drives. SATA drives are comparable to Fibre Channel drives on disk reads, delivering about 85% sustained transfer rate at 10,000rpm compared to Fibre Channel drives. SATA’s sustained data-transfer rates on writes, however, are significantly slower than Fibre Channel.
Maximum I/O size-Bandwidth-driven applications typically generate large files and large I/O block sizes for controllers and drives. Traditionally, since SATA could only process one command at a time, its maximum I/O size of 128KB significantly hindered performance compared to Fibre Channel. The maximum I/O block size for Fibre Channel is dictated by the controller’s maximum segment size.
Based on tests conducted in the lab, although manufacturers’ NCQ-enabled SATA drives may differ in performance, it is generally safe to conclude the following:
- For sequential reads: Overall SATA drive performance is close to that of an FC drive with increasing queue depth;
- For random reads: SATA drive performance seems to get worse with increasing queue depth as compared to FC drives;
- For sequential writes: Most FC enclosures today disable the drive’s write cache. While FC drives handle this configuration reasonably well, ATA drives are optimized to handle “write cache enabled.” Hence, with write cache enabled, SATA perform much better than FC drives; without write cache enabled, the overall performance of SATA drives gets worse with increasing queue depth; and
- For random writes: Overall SATA drive performance is close to that of an FC drive with increasing queue depth.
While traditionally SATA drives support larger capacities, they are not designed for fast access to data or handling large amounts of random I/O. As a result, SATA drives may deliver about one-quarter the IOPS performance of Fibre Channel drives-assuming the same number of drives-and may be unacceptable for applications that require maximum IOPS performance.
However, SATA drives that support NCQ are a good fit for many bandwidth-intensive applications because they can still provide enough throughput to maximize a controller’s MBps performance, keeping in mind that with small to medium-sized configurations it typically may take up to twice as many SATA drives to match the MBps performance of Fibre Channel drives. But once the storage system’s maximum bandwidth is reached, overall throughput can be sustained with either SATA or Fibre Channel drives.
In addition to random and sequential access patterns, another consideration is access frequency and its relationship with secondary storage. Some secondary storage applications generate random data access, which on the surface does not fit SATA’s performance profile. But these applications (e.g., fixed-content and reference data) will have sporadic access activity on large quantities of data and will therefore primarily be measured by cost per gigabyte and not by performance. Secondary storage applications that fit SATA’s performance profile include backup and replication, which generate sequential I/O, thus playing into the performance strength of SATA.
When properly implemented into a fully featured, enterprise-class storage system, SATA technology enables large amounts of information to cost-effectively be stored online, bringing real value to the data center. As the foundation of a cost-effective solution, SATA technology is ideally suited for the specific workload, capacity, and cost requirements of nearline storage in enterprise environments.
The availability of SATA drives with better seek times, higher rpms, higher MTBF ratings, and dual-ported capability will put these drives on par with Fibre Channel drives, without the cost premium. These improved drives will also fuel acceptance of SATA-based storage arrays for primary storage applications in small to medium-sized business environments. Of course, the availability of storage systems based on Serial Attached SCSI (SAS) drives later this year will complicate the evaluation process. Since SAS uses the same transport mechanism and can tunnel SATA protocol, SAS disk arrays can mix and match SAS and SATA drives, possibly resulting in the universal panacea that IT administrators may be looking for.
Ram Gopalan advises early-stage companies and venture capitalists in the storage and semiconductor arena. He can be reached at email@example.com.