SAN advice from the front lines

An interview with Edward Mann, a storage architect who has built storage area networks (SANs) for some of the largest government agencies.

Every storage area network (SAN) environment differs from one another, and there doesn't seem to be an out-of-the-box SAN solution. These two conclusions come from Edward Mann, a storage architect who has worked at StorageTek and Veritas and has consulted on the design of SANs for the NASA Kennedy Space Center, the U.S. Coast Guard, U.S. Department of Agriculture, and the National Institutes of Health. He has a solid command of complex, heterogeneous environments, both hardware and software, and specifically SANs.

Mann's storage consulting firm Mann Data Storage Consultants (www.datastorageconsultants.com) often takes him around the world. He recently worked on building a complex 400TB SAN to handle specific vertical applications such as medical imaging. Mann also teaches SAN design and management courses for Marcus Evans Ltd. in Europe. Since he knows the challenges of making everything in a SAN work together, his answers should give readers technical tips for building SANs.

Could you discuss a SAN installation in which you ran into a situation where the networking components just didn't work together? How did you get around the problem?

The U.S. Coast Guard's SAN consisted of various types of HP-UX servers and a few Windows NT servers. The SAN had a DLT library for backup with a SCSI connection. So, we had to put in bridges to convert SCSI to Fibre Channel. However, the bridges kept resetting.

We also found that HP-UX needs a product called QuickLoop in order to log into a fabric network. Brocade developed QuickLoop especially for Hewlett-Packard. QuickLoop creates an arbitrated loop environment and puts the tape device into the arbitrated loop. (The downside with QuickLoop is that it's only good for two switches.) A blip usually occurs, causing you to lose one of the servers. This situation, in turn, causes the arbitrated loop to redo itself. When this happened, the loop was dropping the bridges. Why? The bridges kept dropping their worldwide name, which is created by the switched environment.

When the loop reset and went through a blip, the system no longer recognized where it was. We solved the problem by hard-coding the worldwide name into the bridges so they wouldn't reset.

This situation also contributed to a SCSI "high" condition, as opposed to a SCSI "low" condition. A SCSI "high" condition occurs when the data transmission never gets to say "I'm through sending data." Since the bus remains "high," it needs to be completely reset.

What did you learn as a result of building the Coast Guard's SAN?

You're better off staying away from arbitrated loops because they go through blips. On the other hand, arbitrated loops enable an organization to build a SAN for less money than a switched network. But, you need to be aware of what can happen.

For example, if you're using a "dumb" hub, then everything that goes into that hub is in the same arbitrated loop. So, the loop shares bandwidth with everything else on the loop. Say you start out with a 100MBps connection, for instance. If you have five servers in the loop, all of a sudden your 100MBps is now 20MBps because you have to share the arbitrated loop. This situation is similar to what happened with QuickLoop, which kept resetting going through a blip.

How would you minimize networking issues in SANs, and what advice would you give to IT professionals so they can avoid some of these problems?

I recommend building a SAN with a totally switched environment based on components you know will absolutely work together. For example, Fibre Channel tape devices that don't rely on SCSI-to-Fibre Channel bridges will make for a more seamless and less-complex environment.

Right now, it's difficult for IT professionals to find components that really work well together. They get a lot of different information from vendors touting that their products will work with everyone else's. I know what works and what doesn't work only because I've spent so much time trying to correct what didn't work. I strongly recommend that IT professionals stay with the key players.

Is there anything on the disk array side that's important for IT professionals to be aware of?

Yes. Make sure any storage device has the capability of logging into a fabric. Specifically, make sure the RAID controller will give up a worldwide name that the switch can use. You also should avoid having to do arbitrated loop work between the switch and the RAID controller.

Some trade press articles allude to incorporating network-attached storage (NAS) devices into a SAN. Can you comment on how you would do this?

Some vendors are trying to sell both NAS and Fibre Channel solutions-NAS as a file server and Fibre Channel for databases and archival data. One of the advantages of Fibre Channel SANs is that they remove data transfers from your primary network. If your primary network has massive amounts of traffic on it, it bogs down, regardless of how good a network it is.

NAS has some shortcomings. For one thing, it uses network bandwidth. The second problem is that NAS requires an appliance. Instead of using out-of-box components, a normal operating system, or whatever volume management system or file system you select, you wind up with the NAS vendor's file system doing the translation. It puts a piece of software on every component on the network.

On the other hand, NAS costs less than building a SAN. NAS requires fewer components, is pretty much plug-and-play, and does away with the cost of training. However, you get locked into a proprietary approach from which you can't easily get out.

Elizabeth M. Ferrarini is a freelance writer in Boston. She can be contacted at iswive@aol.com.

This article was originally published on October 01, 2001