By Heidi Biggar
When it comes to managing data, IT administrators at digital media, oil and gas, seismic, and chip simulation companies face a common challenge: making sure that authorized users—who can number in the hundreds or even thousands—have quick, shared access to data files.
In the past, the only real answer to this data management problem lay with network-attached storage (NAS)—and a lot of it. However, technology has evolved and a variety of new options have come to market that can help IT administrators better deal with the ongoing dump of digitized media, making data more concurrently accessible to users throughout organizations.
Technology (e.g., distributed locking mechanisms) is also in the works that will enable multiple users to concurrently read and write to files. Today, concurrent file sharing is often limited to "read" access due to issues with security and data corruption with write access.
For example, Network Appliance is working on ways to address these issues by creating a lock manager that will control access to files. NetApp is also reportedly developing technology (acquired through its recent acquisition of Spinnaker Networks) that will allow it to distribute a single instance of its file system across multiple NAS filers.
Though there is much debate over the new types of file-sharing options available to users—specifically, how they should be labeled and defined—industry experts typically break them into three categories: shared file systems, distributed file systems, and global file systems.
While definitions vary, IBM's SAN File System, ADIC's StorNext, and SGI's CXFS are generally considered to be examples of shared file systems. Isilon's OneFS and Panasas' ActiveScale File System are examples of distributed file systems, and Microsoft's DFS is typically referred to as a global file system.
It is important to note another distinction among these product categories, however. According to Jon Toigo, principal of the Toigo Productions consulting firm, the real difference is whether a file system is "clustered" or not.
Toigo says that true clustered file systems create a single storage pool of both data and associated metadata (e.g., file name, security and permission information, file creation date, etc.), which, among other things, makes it easier to scale the environment. Traditional NAS filers, in contrast, run into both horizontal (as NAS systems are added) and vertical (as trays of disk drives are added) scaling limits as data demands increase, according to Toigo.
Nonetheless, NAS filers from vendors such as BlueArc, EMC, and Network Appliance continue to play a significant role in this market. In fact, these types of NAS products have the strongest foothold in many data-intensive markets, having been the only file-sharing option available to users for years.
"The positive thing about NAS is that NFS is extremely well understood and deployed," says Suresh Vasudevan, vice president of product marketing at Network Appliance. "The challenge going forward is finding a way to scale performance."
Until recently, there has been no explicit "rule-of-thumb" about how to deal with files of this magnitude and with demanding user access requirements. "Organizations traditionally used a lot of NAS systems to deal with the problem, but now they're moving into SANs [storage area networks]," says William Hurley, senior analyst with the Enterprise Application Group, a division of the Enterprise Storage Group (ESG) consulting firm.
The problem is that there hasn't been a practical way of storing, moving, and managing these data types, says Jim Farney, senior marketing manager for media industries at SGI. By implementing a shared file system, users can boost performance as well as gain better control of their storage environment, he says.
Additionally, these types of file systems allow users to attach various types of storage devices, unlike traditional NAS appliances and distributed file systems that are built on NAS-based architectures.
"If you own the file system, you can control where a file is located, how it is created, and how it is moved around," says Nick Tabellion, CTO at Softek. "Plus, you can put lot of neat software into it that can make things like information life-cycle management [ILM] much easier."
In addition to enabling ILM, a shared file system can serve as a foundation or "trigger" for a variety of other proprietary or third-party storage services or functions, including backup and archival, hierarchical storage management (HSM)-like data movement among storage tiers, etc., according to Tabellion.
"It's when you begin to do this that the advantages of a shared file system begin to make sense," says Tabellion. "But if you really want to increase efficiency and utilization, you need to break down the [proprietary] walls of shared file systems."
One company that appears to be going in this direction is IBM. Its TotalStorage SAN File System (a.k.a. StorageTank) supports open standards and includes a common information management (CIM) agent; however, the file system currently lacks broad operating system and disk array support (see "IBM delivers SAN file system," InfoStor, November 2003, p 1).
But this type of support and functionality is months if not years away. So, for now, when you compare NAS and shared file systems head on, it really boils down to performance.
"Both allow you to share data; the difference is how you access it," says Softek's Tabellion. Because NAS access takes place at a very high level, performance can be multiple times slower than direct block access, he adds.
And in collaborative environments like multimedia or certain scientific or seismology settings, performance is paramount: Users need to be able to share files at top speeds.
"That's the bigger problem," says Bill Yaman, vice president of software at ADIC. IT administrators want to know how to get data from this machine to the next most effectively, and they want to know how to deal with increasing amounts of high-resolution digitized data, he explains.
"Users are looking for a file system that allows them to have one contiguous SAN [i.e., one common storage back-end] so that they don't have to copy files," says ESG's Hurley. Making unnecessary copies of files not only creates management issues for IT administrators, but it also drives overall storage costs skyward due to the sheer volume of data being generated on a daily basis by these industries.
Compounding this problem in industries such as multimedia is the move to higher-resolution special effects, which has capacity, speed, and access implications.
The bottom line is that whether you're talking NAS, shared file systems, distributed file systems, or something else along those lines, no solution is perfect for every environment. And which option, or combination of options, you choose ultimately depends on particular data requirements and infrastructure preferences.
For Weta Digital, a New Zealand-based post-production studio responsible for the special effects in the Lord of the Rings trilogy, Network Appliance NAS filers were more than suitable for its storage needs. The company used three NetApp F840s and five F880s with one NearStore R100 system as a buffer store for first scans, as well as SGI Origin 2000 systems with 3TB of storage and tape libraries. More than 140 graphic artists had access to these scans, although the company limited the amount of data that could be accessed at any given time by the artists to 20TB.
For others, perhaps those with SANs already in place, implementing a shared file system or even a combination environment may make sense. For example, for Warner Brothers Studio, which needed high-speed access and cross-platform support due to shifting data patterns (and higher resolutions) and new workflow requirements that made it necessary for files to move between SGI and Linux platforms, ADIC's StorNext shared file system wound up being a good option.
"We were able to bring together their data without having to push the data out to other platforms," explains ADIC's Yaman. "And we were also able to bring in an ILM component [migration] by adding in nearline storage."
However, for other digital media outfits like FotoKem, a mixed environment may be necessary. FotoKem implemented an ADIC StorNext-based SAN a little more than a year ago followed by an Isilon IQ distributed file system in November.
According to Paul Chapman, vice president of technology at FotoKem, the decision to implement the StorNext file system instead of other technologies, including traditional NAS, came down to two factors: It met his performance and scalability requirements. Chapman needed to move data among SGI, Windows, and Linux clients.
"We were looking to move pretty large files around [12MB to 13MB per file] and a lot of them [about 1,440 files per minute], and StorNext could keep things running at 300MBps. We couldn't do that with NAS," says Chapman.
Chapman says that the decision to implement Isilon's IQ system six months later was necessitated not by any shortcomings with the StorNext file system (in fact, he views the two systems as complementary) but, rather, by the need to support more client workstations and additional file types (e.g., pdf and Mac files).
Additionally, the Isilon IQ system will accommodate FotoKem's plans to build a rendering farm without jacking up overall storage costs. "It doesn't cost you anything to add clients to NAS, but you have to pay a client license each time you upgrade in a SAN," says Chapman.