By Drew Robb
A member of the Association of Storage Networking Professionals (www.asnp.org.) who manages storage for a large North American retailer gathered the information in this case study. Due to rapid growth, this Fortune 500 company needed a major storage upgrade. Incremental tape backups were done nightly and sent off-site, and full backups were completed weekly and monthly and also sent off-site.
The retailer had a collection of isolated, direct-attached storage (DAS) arrays, each inaccessible to other Unix or Windows hosts. Complex configuration required highly skilled storage experts and excessive time to expand systems. This resulted in high administrative overhead due to numerous hosts and storage arrays with independent interfaces.
Data was unprotected and was not mirrored to a remote location for disaster recovery or business continuity. In addition, multiple potential points of failure existed in the network due to the DAS connections.
The DAS architecture created several issues, including the following:
- An excessive backup window, and decentralized backups;
- A 30% restore rate, with no time for tape verification; and
- File restore time ranged from three to four hours per request.
Meanwhile, storage demands continued to soar. E-mail, new applications, acquisitions, and international expansion fueled the storage growth. Also, the addition of multimedia and full-color digital imagery consumed a lot more space. One branch of the organization used a high-performance Oracle OLTP database, with more than 500 agents typing in orders all day long.
To stay competitive, the company realized it needed to update its storage architecture, implement better data-archiving and restore capabilities, and add robust disaster-recovery capabilities. The goal was to be able to restore core business functionality within four hours of a primary data-processing disaster. This plan involved the addition of a hot standby storage subsystem located at least 1,000 miles away that would receive 15-minute asynchronous incremental updates from the primary storage system and be able to run core production systems within four hours of a failure.
Another objective was to minimize total cost of ownership (TCO) by keeping things simple, finding a single-vendor solution to avoid finger pointing, maximizing use of the existing knowledge base (especially TCP/IP, NFS, and CIFS), and adopting a flexible architecture that could accommodate NAS or a SAN.
In addition, the environment had to be highly available (fully mirrored, fully clustered), with minimal subsystem components and a simple Web-based storage management interface to maximize staff productivity.
Finally, the storage architecture had to be able to quickly replicate data across a WAN, using standard protocols.
For data archiving and restore, the company’s goals were to reduce the backup window on production systems from eight hours to less than five minutes; reduce data retrieval and restore time from hours to minutes; integrate backup systems with disaster-recovery systems; and deliver 99.9999% data integrity for all restored files. This was to be accomplished by moving to disk-based backup/recovery and reserving tape backup for periodic archiving of the disk-based system.
SAN vs. NAS
As the company evaluated potential storage architectures, it ran into the old spin: In a heavy OLTP environment, the overhead associated with NAS protocols would significantly impact the performance of the database and would not be a reliable and scalable solution, so a SAN must be used to avoid this condition. However, the SAN vendors were not able to provide independent test data to support this theory.
As a result, the retailer decided to test whether operating a high-volume OLTP database on NAS would yield acceptable performance. If this proved out, it would also permit the company to
- Obtain any-to-any connectivity (e.g., any host could mount to any volume);
- Maximize the use of existing networking staff with expertise in TCP/IP, Ethernet, NFS, CIFS, and routing and reduce the need for additional storage administrators; and
- Gain more efficiency and lower TCO through simplification and not having to add new components, such as SAN switches and adapters, into the environment.
To achieve its goals, the company hired a third party to develop test routines based on existing internal code to simulate high-volume OLTP processing. These tests stressed all database operations. All testing on various vendors’ hardware was done using test scripts running against a copy of the company’s live production databases.
A criteria filter is a simple method of measuring a vendor’s performance against a pre-defined metric. For example, in this retailer’s case, cost filter one equals less than $1.5 million acquisition, cost filter two equals $1.9 million TCO over five years, etc. Using such filters, only vendors that met these criteria were considered. Five major SAN and NAS vendors were evaluated. The primary criteria filters were cost of acquisition, five-year TCO, system performance, management interface, ease of use, simplicity of system architecture, and disaster-recovery capabilities.
Using this test scenario, Network Appliance passed all of the criteria filters. For disaster recovery, the NetApp solution offered a 24TB clustered and mirrored NAS system in the primary data center, performing 15-minute asynchronous updates across the WAN to a remote 24TB hot standby disaster-recovery site. For backup and restores, it offered 24TB of nearline R150 storage for disk-based backup/restore, as well as 15-minute asynchronous updates to a nearline R150 for disaster recovery. File restores now take less than three minutes, as opposed to three to four hours they used to take previously.
At this company, DAS had become too labor-intensive and cost-prohibitive. To manage a huge Oracle OLTP database more simply, the retailer adopted NAS, which enabled IT to leverage in-house networking expertise. While some vendors told the retailer that NAS would burn up too much CPU time, this did not turn out to be true. In-house testing showed that while there is increased overhead in running NAS protocols, the current generation of processors and low cost of the NAS solution made this a non-issue.
Running OLTP database applications on NAS provided an excellent cost/performance ratio. As a result, three divisions of the company now run high-performance OLTP on NAS.
In terms of cost, the NAS solution proved to be about 65% the cost of a SAN solution. And that doesn’t include intangible savings such as not having to train or hire staff with SAN expertise. And, in this case, the performance of NAS has turned out to be as good as or better than DAS.
Drew Robb is a freelance writer in the Los Angeles area.
Q&A with Daniel Delshad, chairman of the ASNP
What is the Association of Storage Networking Professionals?
ASNP provides an open forum for members to discuss real-world problems and solutions related to storage networking. Through its regional chapters and annual conference, ASNP offers educational training and networking opportunities. Members also have exclusive access to the association’s online portal, which features training and certification resources, newsletters, product reviews, member and vendor directories, and discussion forums. For more information, visit www.asnp.org.
What is the mission of the ASNP?
Our mission is to educate and empower members by providing them with educational resources, member meetings, and other professional development opportunities. Our vision is to build a worldwide community of storage networking users and to act as an advocate for users’ needs.
How many members and chapters do you have?
We have more than 2,000 members and 27 chapters around the world.
What does the membership base consist of?
Members include IT managers, systems administrators, CIOs/CTOs with storage responsibilities, consultants, and university professors and students interested in storage networking. The amount of storage managed covers the spectrum. For example, 77.5% of our members manage anywhere from 1TB to 99TB, yet 16.2% manage less than 1TB.