Industrial Light & Magic uses high-speed NAS servers with a distributed file system, 10Gbps Ethernet, and a 5,000-node render farm to store and move 170TB of content.
By Barbara Robertson
When George Lucas moved a large part of his filmmaking empire from San Rafael, CA-a small town north of San Francisco-into a state-of-the-art, four-building complex on 17 acres of parkland in San Francisco’s Presidio, he spared no detail. Lawrence Halprin, the renowned landscape architect, even rearranged individual rocks in the babbling brook that rambles through the campus to achieve the most pleasing sound.
Similarly, the technical team left no stone unturned when it developed the infrastructure that powers Industrial Light & Magic (ILM), Lucas’ award-winning visual arts facility, and the Lucas Arts game development division. “When we went from San Rafael to the Presidio, we had a 10x increase in network bandwidth,” says systems developer Michael Thompson. “We knew it would be coming, so we designed a system that could handle a massive jump in network throughput.”
At the new Lucas Digital Arts Center (LDAC) in the Presidio, a 10Gbps Ethernet backbone feeds data into 1Gbps pipes that run to the desktops. About 600 miles of fiber-optic cable thread through 865,000 square feet of building space; the network is designed to accommodate 4K images via 300 10Gbps and 1,500 1Gbps Ethernet ports.
A 13,500-sq. ft. data center houses the render farm, file servers, and storage systems; the data center’s 3,000-processor (AMD) render farm expands to 5,000 processors after hours by including desktop machines.
“All these render nodes constantly need data,” says Thompson. “At ILM, and probably at most visual-effects studios, there is an ongoing war between the render farm and storage. Currently, we have about half-a-dozen major motion picture projects underway. Keeping everyone happy requires feeding a phenomenal amount of data to those render nodes.”
How much data? “The whole [storage] system holds about 170TB, and we are 90% full,” says Thompson.
In a visual-effects-laden film such as Star Wars, nearly every minute of the 140-minute film included work by ILM. For the film Jarhead, which is not considered a visual-effects film, ILM created about 40 minutes of effects. With that in mind, consider this: ILM currently renders most visual-effects shots at around 2K x 2K resolution; however, some productions are moving to 4K x 4K resolution. A shot is an arbitrary number of frames; film is projected at a rate of 24 frames per second and video at 30 frames per second. To produce the final shots, compositors combine several layers of rendered elements for each frame. A 100-layer shot is not unusual; most shots include at least 20 layers. It took 6,598,928 hours of aggregate render time to produce the shots in Star Wars: Episode III-Revenge of the Sith.
Lucas Digital Arts Center
The IT team began looking for a new storage system about three years ago when Lucas was beginning work on Revenge of the Sith. They chose Spin-Server NAS hardware and the SpinFS distributed file system from start-up Spinnaker Software.
“The system had all the attributes we needed to go forward,” says Thompson. “We knew we’d have major scaling issues, and it could scale well. And it has good data management features and a unified naming space [a.k.a. global namespace].”
But, shortly after ILM purchased the system, Network Appliance bought Spinnaker. “It was spooky for us,” says Thompson. “We didn’t know if they would deep-six the technology. But it turned out to be a good deal. For the past two-and-a-half years we’ve been prototyping NetApp’s Data ONTAP NG [Next Generation] software, which includes the Spinnaker software.”
ILM now uses 20 Linux-based SpinServer NAS systems and about 3,000 disks from Network Appliance. “In six to nine months, we’ll swap the Spin-Servers for Network Appliance hardware, but will still run the same software stack,” says Thompson. “Our system is a weird hybrid: It has all the features of a SAN, but it does NAS as well.”
Linux-based render boxes at ILM talk to the disk storage systems via the NFS protocol. Brocade Fibre Channel switches handle data transfer between the SpinServers and two types of disks: high-speed production disks and slower nearline disks used for archiving data before it goes off-line to an ADIC Scalar 10K tape library. Couriers deliver final shots to production studios on FireWire drives.
“One of the nice things about our storage system is that it allows you to run the disks very full,” claims Thompson. “The 3,000 disks are divvied up into 20 stacks, and as they fill up, the data moves from one to the next. However, the users can still get to all their data via normal paths. They don’t know we’re moving data around behind the scenes.”
Because the Spinnaker system has one unified naming space, all the disk drives look like one giant disk to the users, whether the data is on the fast production disks or on the slower nearline disks. This means the studio can organize its file systems into a tidy hierarchy. Before, people working on shots had to keep track of which servers had the elements they needed. “Now, it looks like one giant disk and they can keep everything for one movie in one area instead of on 14 different servers,” Thompson explains. “And, because the system spreads the data across the servers so that it’s evenly balanced, we can add servers as we need them.”
In fact, during the move from San Rafael to San Francisco, the two facilities acted as one. “We had people on both sides of the Golden Gate Bridge accessing the data and moving it around without losing access,” says Thompson. The studio leased a fiber-optic cable that ran from San Rafael to Berkeley and then across the Oakland Bay Bridge to San Francisco to link the SpinServers in San Rafael to those in San Francisco. “All the data still showed up as one virtual disk,” says Thompson.
Because they could run the two facilities as if they were one, ILM could move people from one location to the other in waves; it was never necessary for anyone to stop working in order to move. “Without this system, we would have had to completely shut down the whole facility,” says Thompson. “Our daily burn rate was around $50,000 a day for downtime. It would have cost millions of dollars, and that doesn’t take into account delays.”
Now, Thompson is looking at ways to implement a similar system between Singapore, where Lucas has opened an animation studio, and Lucas’ headquarters at Skywalker Ranch north of San Francisco. He installed 20TB of storage on Network Appliance hardware running the Data ONTAP NG software in each location, but the problem is WAN latency.
“Data access over fiber between San Rafael and San Francisco was very fast, but when you’re shooting packets to Singapore and introducing millisecond delays, the computers start bogging down,” says Thompson. “It’s not the throughput; it’s the round-trip time. We’re looking at Network Appliance, Hewlett-Packard, and a lot of start-up companies that deal with these WAN issues for a solution.”
Meanwhile, back at ILM, Thompson wants to try playing high-performance, 600Mbps HD video off the core storage. Currently, the studio uses custom-designed, dedicated HD video servers. “When you’re streaming uncompressed HD video to the desktop, the throughput is astronomical,” says Thompson. “So we have home-grown HD servers. There’s a feature in the new ONTAP NG software, though, that we think we can use to stream HD video to the desktop for the whole facility. Each server would do 1/20th of the load and when they’re combined we could play at warp speed.”
Would that imply more data storage? “I’ve been doing storage here for six years now, and I’ve found that people will use up whatever you put out there,” says Thompson. “We’ll probably be buying more disks this year. But, at least now adding more storage to the system takes only a couple of hours.”
Barbara Robertson is a freelance writer and a contributing editor for Computer Graphics World. She can be reached at BarbaraRR@comcast.net.