The MPC DataFRAME 440 leverages enhanced virtualization software, SATA-II drives, and iSCSI to push SAN technology into new arenas, including the desktop.
By Jack Fegreus
In terms of storage market buzz, virtualization remains at center stage. Managing logical devices, which have no physical constraints, is much simpler and thereby incurs dramatically lower overhead costs.
That’s why extending the use of a SAN, which connects systems to logical block storage devices using the SCSI command set, is considered a priority at most sites. In that context, NAS is seen as an economical means to making a SAN available to PC desktop systems. On a superficial level, that pits NAS against iSCSI.
Like Fibre Channel, iSCSI is a transport-layer protocol, and it runs the SCSI-3 command set over a TCP/IP network. Because iSCSI works locally over an Ethernet LAN, it is commonly thought of as a competitor to NAS, but the differences in the details are extreme.
Locally, iSCSI presents host computers with data blocks from logical devices over an Ethernet LAN. Typically, a logical partition of a true physical RAID array is presented to a host system as a local storage device. This is the most basic level of storage virtualization.
On the other hand, a NAS server shares file data, not raw block data, with clients. The use of a diskless NAS server to export logical SAN disks to desktop systems using NAS file sharing-a scheme often marketed as NAS-SAN fusion-actually adds another layer of systems management overhead to a site’s workload. NAS is intended to share files collaboratively among many users: Using NAS to extend exclusive use of a logical volume is the equivalent of using a jumbo jet for a commuter shuttle
As a storage device targeted at the IP-SAN market, the DataFRAME 440 storage server from MPC supports all types of storage network connectivity, from Fibre Channel to TCP/IP. Nonetheless, it is for use as an IP-SAN storage server that the DataFRAME is optimally configured. To that end, the storage server adds a protocol, dubbed Ethernet Block Storage Device (EBSD). This protocol is touted to improve on iSCSI with respect to multi-path I/O (MPIO), which deals with data path fail-over and makes it an important part of SAN resiliency.
Virtualization is key
The value proposition of a SAN is derived from device virtualization, which leads to cost savings through simplifying storage management using logical drives and increasing the percentage of resource utilization. With Fibre Channel connectivity, the DataFRAME array can be used with servers that need a tiered hierarchy of storage devices based on performance and cost. More importantly, with IP connectivity, the DataFRAME can extend a SAN to the desktop inexpensively and open the door to untapped savings.
The MPC DataFRAME 440 is an intriguing product on a number of dimensions. First, it is built from the ground up as a powerful Linux server. The base hardware, dubbed a storage system module (SSM), sports dual Xeon CPUs, dual Gigabit Ethernet ports, and SATA-II RAID controllers and drives. Running on the SSM is Intel’s Storage Server Console, which is intended to simplify the issues that increase the complexity of SAN fabrics. This software is underpinned by LeftHand Networks’ Java-based SAN/iQ software package, which brings the DataFRAME tantalizingly close to the utopian vision of the perfect SAN device.
SAN justification rests on reducing the complexity and cost of storage management. In the best of worlds, a SAN provides the means to manage all physical storage volumes from a single management interface. While a single DataFRAME 440 can be set up on a SAN, the real strength of the bundled software is its ability to aggregate multiple storage servers to be managed as a single logical storage resource. In fact, when only one DataFRAME is installed, an administrator must still follow the steps required for multiple servers and create a management group and a cluster of one.
The Storage Server Console sees the world of SAN storage as existing within a distinct hierarchy. Under this scheme, a management group integrates a collection of distributed SSMs and makes the group appear as a single, large virtualized storage resource. SSMs on the network can be dynamically added to or deleted from a management group, which creates a federated approach to data integration, pooling, and sharing. Administrators can consolidate storage into one or more groups, dubbed clusters, which hide all of the complexities of data location, local ownership, and infrastructure.
The underlying construct of network-centric grids and Web services is a Service Oriented Architecture (SOA). In an SOA, applications are built using distributed components dubbed services. The functionality of each service is exposed via a standards-based interface.
For the DataFRAME 440, the Storage Server Console is the means by which services residing on the SSMs are discovered, accessed, and orchestrated. System administrators can install this package on a workstation running either Linux or Windows. Via the Storage Server Console, an administrator can log into a management group without logging into the individual SSMs and gain access to all of the global configuration parameters, including the configuration of each cluster or storage pool.
When provisioning storage, the level of assistance provided by the Storage Server Console can initially be quite distracting. That process begins with the setting of RAID levels for disk arrays on the DataFRAME. Minimalism is the rule. Administrators are given an explicit set of RAID choices based on the configuration of the drives in the SSM chassis.
Choices may include RAID 5/50, RAID 1/10, and RAID 0; however, only RAID levels that can be supported by the disks present will be in the menu. Given this level of automation, it is not surprising to find that there are no options available for explicit tuning of the RAID architecture. The goal of the Storage Server Console is to minimize all administrator-level overhead.
Nonetheless, what is lost in low-level configuration management is more than compensated for in high-level functionality. Few host operating systems allow disk drives to grow larger when they run out of space. As a result, it is necessary to provision a host system with excess storage capacity when using physical volumes. On the other hand, an administrator can create logical volumes on the DataFRAME that exceed-either singularly or in total-the total capacity of the physical drives.
Provide now, provision later
In a provide-now, provision-later scenario, a system administrator uses the Storage Center Console to present each host with logical volumes that have the greatest capacities needed for future growth. With the host operating systems future-proofed, it is only necessary to provision the DataFRAME with enough storage capacity for current use. It will then dynamically grow each logical volume from a pool of physical storage.
For dynamic growth, a data-volume threshold is set on each logical drive. When a drive’s actual storage usage reaches its threshold, the DataFRAME array automatically maps additional physical storage blocks to support real usage growth on the logical drive. From a storage management perspective, it is now only necessary to manage and provision the single pool of disks from which physical blocks are being allocated to logical blocks.
There are, however, even more services and features for ensuring reliability, accessibility, and scalability (RAS) storage services that should be equally intriguing for IT managers looking to consolidate storage. These added services are delivered through three option packs: the Scalability Pack, Configurable Snapshot Pack, and Remote Data Protection Pack.
The Scalability and Remote Data Protection Packs help lower total cost of ownership (TCO) by simplifying management tasks via resource virtualization. Using the Scalability Pack, an administrator can create clusters of SSMs within a management group. Each cluster can then provide a unified RAID storage pool across its member SSMs. For the system administrator, this means the storage on multiple SSMs can be managed as a single logical resource.
For enhanced RAS, logical disks can be replicated among two or three SSMs. With such replication, if one DataFRAME 440 were to fail, the data would still be available on the network. This replication can also be used to fine-tune requirements for disk resources and performance. With a single SSM, data redundancy will be defined by the RAID configuration of the single SSM. With multiple SSMs and replication, data that needs a higher- performance level can be placed on a logical volume-a RAID-0 array, for example-and then replicated to a logical volume built on a RAID-5 array.
The Scalability Pack enables another IT best practice: the creation of snapshots, which provide a key disaster-recovery mechanism. Snapshots create read-only copies of logical volumes at specific points in time for data recovery due to malicious or accidental data corruption. By adding the Remote Data Protection Pack, administrators can further maximize business continuity by pushing any data copy operation, such as a snapshot, over an IP WAN to a volume on an SSM located at a remote location.
Although advanced RAS features are extremely important, basic storage virtualization remains the foundation of SAN management services. Host operating systems were never designed to share storage devices and will assume exclusive ownership of any block-level device that they discover. That leads to total chaos when multiple systems are allowed to discover and mount the same volumes. Each system will have an incorrect local view of the file system with respect to the data structure on the logical disk, which can eventually make it impossible to mount the device.
To avoid that situation, the Storage Server Console provides a means to control the presentation of virtualized volumes to hosts via every access method supported by the DataFRAME: Fibre Channel, iSCSI, and EBSD. The Storage Server Console scheme centers on the creation of Authorization Groups, which are defined by rules specific to each method of access.
Creating virtualization rules is a relatively simple process; however, an administrator will need external utilities to gather all of the information needed to create authorization rules for Fibre Channel and iSCSI access. The Storage Server Console GUI does not provide a means to browse for this information.
Formulating a rule for Fibre Channel access requires the ID of the port on the Fibre Channel HBA (WWPN) through which the logical disk will be accessed. This data is readily available from fabric-monitoring utilities bundled with Fibre Channel switches. To create an iSCSI Authorization Group, an administrator will need to provide the ID of the iSCSI initiator used by the host server or workstation. The Microsoft iSCSI Name Server (iSNS) utility provides a list of IDs for both iSCSI targets (mountable storage volumes) and initiators (iSCSI-enabled workstations and servers).
Formulation of IP-SAN authorization rules is simplified using the DataFRAME’s EBSD protocol. MPC provides drivers for both Linux and Windows systems. Virtualization of EBSD volumes is done using the IP address of the host’s Ethernet NIC. By using static IP addresses on the hosts that will access EBSD storage, the virtualization of storage volumes is a very simple matter. In addition, the use of IP address ranges makes it very easy to provide read-only access to a number of systems in an Authorization Group.
iSCSI: SAN to the desktop
Both iSCSI and EBSD play well with the notion of extending the reach of a SAN out to the desktop. As with all consolidation projects, the burning issues for network edge devices are the efficient utilization of resources and business continuity, with special attention given to data security and recoverability. On all of these counts, desktop PCs, along with special-purpose departmental servers, have been very difficult for IT to encompass within consolidation projects.
That difficulty helps explain the appeal of iSCSI. The simplicity of encapsulating SCSI commands and data in TCP/IP packets and transmitting them over Ethernet networks plays perfectly with the drive for greater IT efficiency. The minimal investment needed for the NICs, switches, and cables to set up a working Gigabit Ethernet fabric is a fraction of the cost associated with Fibre Channel components.
What’s more, the demands on performance are far less exacting from the desktop perspective. Single ATA and SATA drives dominate corporate desktop systems. Small SATA RAID arrays can be found on low-end servers. Moreover, Windows XP and Windows Server dominate the operating system environments for these systems. As a result, most I/O operations will involve small 8KB data blocks.
For desktop I/O, we benchmarked direct-attached ATA drives at about 14MBps when delivering 8KB read requests. This puts the bar for throughput low enough to use Fast (100Mbps) Ethernet. With SATA drives, desktop read throughput jumps significantly. In particular, we found 8KB reads to be on the order of 35MBps. Sustaining multiple desktops at that level of throughput performance will require the use of Gigabit Ethernet.
We then ran our throughput benchmark on a logical volume that was exported from a RAID-5 storage pool. We ran our tests from a quad-processor HP DL580 G3 server running SuSE Linux Enterprise Server (SLES) 10. In these tests, we used the new iSCSI capabilities found in SLES 10 over Gigabit Ethernet and 2Gbps Fibre Channel.
On all of our I/O benchmarks, throughput over 2Gbps Fibre Channel and Gigabit Ethernet exhibited little or no difference.
As expected, write throughput on the RAID-5 array was measurably slower. One of the factors limiting the performance of SATA drives is the lack of support for command queuing. Unlike SCSI and Fibre Channel drives, most SATA drives cannot re-order I/O commands to minimize the movement of the disk’s actuator arm. We pegged writes at a sustained 30MBps.
Nonetheless, the level of throughput on all of our tests was equal to or better than the throughput measured on all other ATA- or SATA-based SAN storage servers that we have tested. As a result, the DataFRAME 440 could easily be used to provide virtual storage volumes on diskless desktop systems and to provide second- or third-tier storage for servers on enterprise-class Fibre Channel SANs.
SAN infrastructure costs have historically presented a significant hurdle to SAN adoption and expansion. As a result, the benefits of SANs have not been spread beyond servers in computer centers. The key to changing this perception of SANs lies in Ethernet technology.
With the functional capabilities of software, such as LeftHand Networks’ SAN/iQ, SANs can now be easily pushed out to the desktop over existing Ethernet infrastructure. That significantly extends the cost benefits of virtualization, especially with regard to lower TCO through simplified management. Also, low-cost SATA technology can be applied more efficiently in a SAN to provide respectable performance at lower costs and greater functionality related to business continuity.
Jack Fegreus is technology director at Strategic Communications (www.stratcomm.com). He can be reached at jfegreus.com.
openBench Labs Scenario
IP/FC SAN storage array
WHAT WE TESTED
Two MPC DataFRAME 440 disk arrays
- Linux OS
- Dual 3GHz Intel Xeon processors
- Up to 12GB ECC memory (cache)
- Dual Gigabit Ethernet IP-SAN ports
- Optional QLogic Fibre Channel HBA
- Fast Ethernet management port
- 16 SATA-II drives
- Java-based Storage Server Console
- RAID levels 0, 1/10, 5/50
HOW WE TESTED
HP ProLiant DL580 G3 Server
SuSE Linux Enterprise Server 10
- oblDisk v3.0
- oblWinDisk v3.0
- Dynamic support for expanding and restructuring arrays
- The DataFRAME automatically maps additional physical storage blocks to support real usage growth on a logical drive.
- Actual throughput over 2Gbps Fibre Channel and Gigabit Ethernet exhibited little or no difference.
- On desktop systems using iSCSI, throughput on reads averaged 60MBps, which is 50% greater than a single desktop SATA drive.