Network-attached storage benefits include improved performance, faster access times, and file sharing.
By Mark Zimmer
Implementation of a network-attached storage (NAS) system has helped increase the productivity of 75 geophysicists the value by 70%. The higher-capacity data storage system now being used by the Mineral Management Service (MMS) branch of the U.S. Department of the Interior eliminates the need these geoscientists to shuffle large data sets back and forth between slower server-attached storage and 8mm tape.
Higher-capacity network storage also provides the flexibility for geoscientist teams to work simultaneously on multiple seismic data sets. Other NAS benefits include efficient data access to and from all clients, a single point of backup, and minimal points of potential failure.
The net result is that the geophysicists spend far less time dealing with system issues, or waiting for seismic data to be loaded from tape, and much more time analyzing the value of mineral deposits. And availability now exceeds 99.55% uptime.
Affiliated Computer Services' (ACS) Government Solutions Group led the design of the current system and now helps manage the storage systems.
The MMS is responsible for managing the nation's natural gas, oil, and other mineral resources on the outer continental shelf outside state coastal waters. It also collects, accounts for, and disburses revenues from onshore and offshore mineral leases on Federal and Indian lands. Federal mineral leases generate more than $4 billion annually and are one of the government's greatest sources of non-tax revenue. The outer continental shelf currently produces about 27% of the nation's domestic natural gas production and about 20% of its domestic oil production.
The geoscientists work in the offshore MMS regional office in New Orleans. This program analyzes geologic, geophysical, and other data to support outer continental shelf management decisions.
Federal regulations require that oil companies provide copies of seismic data to the MMS for a cost that is limited to reimbursement for reproducing the data. The MMS also purchases data from independent seismic operators. Conventional 2D seismic data sets typically range from 10GB to 30GB, while newer 3D data sets can exceed 100GB.
In the past, data was stored on server-attached storage arrays, and on two RAID arrays ranging to 20GB that were attached to the local Sun Sparc workstations used for data interpretation. The workstations were recently upgraded to higher-performance Sun Ultra 60s with 360MHz CPUs. The geoscientists also have PCs running Windows 98 that are used for e-mail and other desktop applications.
The biggest problem with this approach was that each workstation's RAID array was capable of handling only one to two small data sets. Larger data sets had to be reduced before they could be downloaded to local storage, or MMS spent an enormous amount of time spinning 8mm tapes loading and archiving data sets. Geoscientists at MMS normally work simultaneously on more than one project or with multiple data sets, so they frequently had to download new data sets from slower storage or tape, and archive older data sets back to tape. The result was huge I/O bottlenecks and a loss of productivity caused by waiting for seismic data to be loaded.
Data transfers delay analysis
Many times, it took 20+ hours, or days, to load a typical data set, so a considerable portion of each geoscientist's day was spent waiting for data, or having limited data to work with. There was also a lack of storage capacity within the existing infrastructure for the multitudes of seismic surveys waiting to be analyzed. Geoscientists were usually able to perform other tasks in the background while data was being loaded, but were prevented from performing their core functions.
MMS had tried loading data sets from the previous server-attached storage, but found that performance was far too slow and disk space was lacking. Another problem with the previous system arose from the fact that geoscientists frequently work in teams to analyze data sets. This was difficult in the past because only one person could access a data set at a time and the time required to transfer the data set to another machine was prohibitive. The previous system also presented archiving problems and was being used as a compute server as well as an NFS server.
Affiliated Computer Services worked closely with MMS to evaluate potential solutions to these problems. ACS developed a list of performance criteria, including storage capacity, data transfer rate, cost per GB, and reliability.
A team of IT professionals consisting of ACS contractors and MMS geoscientists evaluated a wide range of systems and narrowed the field down to two network-attached systems and one server-attached storage system. They obtained demonstration systems from a number of manufacturers and performed a month-long series of tests. These included performance testing with real seismic data, as well as a wide range of reliability and serviceability tests. For example, the team tried pulling a disk drive out of the system while it was running to see if the system would crash. Primarily for performance and capacity reasons, the team chose NS7000 NAS servers from Auspex Systems.
Dedicated processor architecture
The key to the I/O performance of the NAS system is its dedicated processor architecture, which is optimized for moving file data from disk to network, and vice versa. The I/O node is the fundamental building block of the architecture. Each node contains an Intel dual-processor motherboard that has logically separate processing functions. The network processor is used to process network protocols and to manage associated caches. The file and storage processor is dedicated to managing the file systems and associated storage hardware.
Each of the three NS7000 NAS servers has 209 SCSI slots that accept a variety of peripheral devices. The systems are configured with five storage processors, one boot drive, one backup boot drive and a CD ROM, five DLT 7000 tape slots, and 201 disk slots per server.
MMS currently has these slots fully populated, mainly with 9GB and 18GB drives, providing a total of about 9TB of data storage. Each system has five dedicated network interfaces, each of which is connected to a Layer 3 switch via a full duplex 100BaseT Ethernet link. Four of the interfaces are dedicated to delivering data to the Sun Ultra 60 platforms, and the fifth interface is dedicated to a data loading subnet.
The NAS server provided immediate, and significant, performance improvements. The combination of the higher data transfer rates of the server, a faster network, and additional storage makes it possible to load data sets directly from NAS without taking a performance hit, and reduces the time the geoscientists wait for data. This eliminates the need to transfer data between local machines or to have to spin tape 24 hours a day to archive/re-load seismic data sets.
Another advantage of the NAS configuration is that the 75 geoscientists can easily work together on the same data sets. Management estimates that productivity has risen 60% to 70%.
The increased storage capacity and performance of the NAS servers has improved the productivity of the geophysicists, eliminated I/O bottlenecks, and provided additional functionality by allowing them to spend a much higher percentage of their time doing their job rather than waiting on data.
Reliability was also improved by the addition of NAS. While the geophysicists all work during the day, some of the number crunching and much of the data loading takes place at all hours. The NAS servers operate on a 24x7 basis, and unscheduled downtime has been practically eliminated.
Mark Zimmer is an information management consultant with Affiliated Computer Services' Government Solutions Group, in New Orleans, LA.