VFX requires high speed and capacity

Studios such as Pixar, The Orphanage, DNA, and Tweak Films leveraged a variety of new storage technologies to meet the high-end requirements of recent feature films and animation projects.

By Barbara Robertson

All animation studios rely on super highways to transport the huge amounts of data needed to fill 24 frames a second in a 90-minute feature film. But Disney/Pixar’s animated blockbuster Cars pushed that studio’s system to the limit.

At Pixar, a 3,000-CPU render farm comprising 64-bit Intel Nocona-based “pizza boxes” reads data in, runs the algorithms, and generates the new data-that is, the rendered frames.

Behind the sophisticated effects in Cars, produced by Disney/Pixar, is a 3,000-processor render farm and Ibrix’s Fusion parallel file system.
Click here to enlarge image

The data comes from a model farm that’s typically 3TB to 4TB in size. “That’s our most valuable data,” says John Kirkman, Pixar’s director of systems infrastructure. “It’s where we store the hand-built models, the shaders, the textures created by the technical directors, and the animation data.” In other words, the model farm is where the characters live. So, when Lightning McQueen, the star of Cars, appears in a scene, the render farm needs to read his data, and herein lies the potential traffic jam: All the characters in Cars are vehicles-race cars, transport trucks, family sedans, sports cars, and even tractors.

As McQueen screeches around a curve during the Piston Cup Championship race in the beginning of the film, stadium lights strobe off his flashy red paint job. The camera follows his battle to the finish line while thousands of cars in the crowd cheer and lightbulbs pop; thousands of points of light bounce off fenders and hoods.

“If McQueen is in most of the frames, you have to read his data across all 3,000 CPUs,” says Kirkman. “The challenge is providing data for 3,000 CPUs all trying to go after the same piece of data.”

For previous films, Pixar relied on Network Appliance’s NetCache technology in front of its NetApp filers, which worked well. But to reproduce the reflections bouncing off the chrome, glass, and steel bodies of Cars’ stars, Pixar used a ray-tracing method of rendering that simulates the paths of light rays hitting an object from various sources and angles to reproduce the effect of real light in a scene.

“With Cars, because we were doing ray tracing, the number of reads needed to calculate a frame increased dramatically,” Kirkman explains. “When you’re tracing rays of light, sometimes you’re reading data that’s not in the frame. You’re reading light hitting a mailbox a mile down the road before it hits McQueen’s fender.”

The data needed for the render farm to do its work at any point in time is usually between 100GB and 200GB. With the previous technology, Pixar was limited to 1.5GB of internal memory. But, by switching to Ibrix’s Fusion parallel file system software, Pixar could pull more data out of RAM. “We’re very sensitive to having to wait for data,” Kirkman says. “We would much prefer to get data out of memory than off a disk drive.”

Pixar installed a 12-node Ibrix cluster. Eight servers fed the render farm, all talking to the same SAN storage device to get data, and four servers maintained the metadata for the file system. Each of the eight “heads” had 32GB of memory. That meant the working set of data could fit in RAM.

“We got a huge multiplier from being able to serve data out of RAM,” says Kirkman. “We expect 100% utilization of our CPUs at all times. If we’re waiting on I/O, we see the difference between the machine time and the wall clock. If something we think should take one hour to render takes three hours, we know we’re wasting time waiting for I/O. Before [we installed] the Ibrix system, the wall clock time was six to ten times what we expected because we didn’t have enough memory. With Ibrix, we reduced that to 15%. We want to complete reads in less than half a millisecond, and we were achieving that as long as we could get data out of RAM.”

For Pixar’s next film, Ratatouille, scheduled for release in June 2007, the studio is installing a second Ibrix system, this one a 16-node cluster. “We’re recycling parts of the Cars system, but we’re going to end up with two [clusters] that we rotate between our current and next films,” says Kirkman. The Ratatouille system has eight servers feeding data to the render farm, four serving users, and four managing metadata.

“The thing we find most attractive is that Ibrix is a software solution,” Kirkman says. “With traditional NAS, you have to buy a big box. If you’re only interested in adding memory, you’re paying for other stuff you don’t need. But the cool thing with Ibrix is that we can scale everything independently, whether we want more CPUs, memory, networks, or spindles.”

In addition to the parallel file system (a.k.a. a segmented file system), Ibrix’s Fusion software includes a logical volume manager and high-availability features. The software allows users to build file systems that can scale up to 16 petabytes of capacity in a single namespace. Fusion runs independent of specific hardware and/or network platforms and supports the CIFS and NFS protocols. Ibrix claims aggregate performance of as much as 1TBps.

The Orphanage accelerates workflow

Superman might fly faster than a speeding bullet, but when The Orphanage needed to aim a bullet right at the man of steel’s baby blues, the studio’s need for speed sent them looking for a new storage solution.

“Our artists and our render-farm machines were starved for file system I/O,” says Dan McNamara, vice president of technology at the San Francisco-based visual effects studio. In addition to the bullet shot, The Orphanage handled a bank job and wild car chase. “The complex scenes for Superman Returns really tested [the system.] We had lots of complex elements-lots of pieces that had to be woven together.”

The Orphanage’s work on Superman Returns involved more than 11.5TB of SAN-based data behind a BlueArc Titan storage system.
Click here to enlarge image

At the same time that The Orphanage artists were leaping over tall data requirements for Warner Bros.’ Superman Returns, a second effects film-the South Korean monster movie, The Host- had its own set of fiendish requirements. The Orphanage created the film’s Han River mutant, a 45-foot long digital creature that looks like a cross between a T. rex and a fish. The film, which received rave reviews at the Cannes Film Festival and broke box-office records in South Korea, made its North American debut at the Toronto Film Festival this month.

“It was intense,” says McNamara of the visual effects work. “We had complex scenes with people firing weapons at the CG creature, and the shots were really long. We wanted to make sure the large files the artists required loaded as fast as possible.”

Now the studio’s 11.5TB of data sits behind a BlueArc Titan 2000 series storage system. The Titan system’s open SAN back-end views the studio’s existing SAN storage as a shared network resource.

McNamara says the studio is getting phenomenal performance-340MBps to 360MBps throughput. “We haven’t had to add storage; we just move the data faster to the users [artists].”

Although The Orphanage had planned to have a storage system bakeoff during which they’d evaluate several systems, BlueArc’s Titan was the first system they tried. “I wish I could tell you we evaluated several systems and here’s all our raw numbers,” says McNamara, “but the Titan exceeded our expectations. It supports CIFS natively [as well as NFS], so that was fine. It met our needs.”

The studio hasn’t regretted the decision. “When you really push some storage systems, you hit a cliff and fall off,” says McNamara. “With this system, you don’t have issues when you really push it. It really changed the artists’ workflow for the better.”

As The Orphanage’s needs grow, McNamara expects they’ll purchase a second Titan storage server and evaluate the new clustering software BlueArc is developing.

“Our biggest thing is giving the artists interactivity,” says McNamara. “We don’t want them to have to wait for scenes to load. This business is about creativity, and we want to make sure our artists are happy.”

BlueArc’s software that runs on the Titan storage servers includes a file system with a cluster namespace for a unified directory structure and global access to data for CIFS and/or NFS clients. The object-based file system supports up to 512TB of data in a single pool. The disk array can be configured with high-performance Fibre Channel and/or low-cost, high-capacity Serial ATA (SATA) disk drives to create a tiered storage architecture. BlueArc claims performance to 10Gbps.

DNA meets the need for speed

When DNA Productions moved from creating the episodic animated TV show, “The Adventures of Jimmy Neutron,” to the full-length animated feature film, The Ant Bully, everything changed. In Warner Bros.’ The Ant Bully, a boy takes out his frustration on some ants. The ants fight back by shrinking the boy to ant size and teaching him the ways of the ants. Ultimately, the ants rely on the boy to help save the colony. Creating a few CG characters for a full-length feature, plus backgrounds and props is difficult enough, but creating an entire colony of characters would tax most server/storage systems.

Rendering for The Ant Bully feature film required a 1,400-processor render farm and a 42-node clustered storage system and software from Isilon Systems.
Click here to enlarge image

“We changed our whole infrastructure,” says Rich Himeise, director of network operations at DNA. “We had to upgrade everything to go from the TV show to a movie.” That upgrade included converting from a Windows-based system to a Linux-based system, buying a 1,400-processor render farm, and installing a new 42-node Isilon Systems’ clustered storage system that provides 80TB of raw storage capacity.

“We run our entire production on the Isilon IQ systems,” says Himeise. “We render to the system, and our assets live on the system. The entire movie lives on the system.” With each frame of the animated film requiring from 1MB to 10MB of data-some even more-throughput and load-balancing were critical.

Isilon’s OneFS clustered, distributed file system spread the load across the 42 nodes. “The clients, the render farm, and the artist workstations all mount across that cluster,” Himeise explains. “Mounting the clients across the cluster increases the throughput to the system.”

Each node has a processor, 4GB of memory, and a Gigabit Ethernet connection. That, in effect, gave the studio a 42-processor computer with 168GB of memory and a 42Gbps connection to the file system. “You can think of the cluster as one big, robust machine,” says Himeise, “with 42 Gigabit [Ethernet] pipes into the cluster.”

The processors handle the file transactions like a typical server, moving data to hard drives. At the back-end of the cluster, InfiniBand switches tie all the nodes together. “Node number one knows what’s on the hard disk of node 42 and what’s in the memory cache of 42, and 42 knows what’s in node one and everywhere else,” Himeise says. “It’s the glue that ties the cluster together.”

Each node handled between 25 and 30 clients. Because DNA used Isilon IQ’s SmartConnect feature, the number of nodes assigned to artists and to the render farm changed as needed. “We could dedicate 30 nodes to our render farm when it became very busy and the remaining 12 nodes to the artists, and then, if the artists complained, we could give them more nodes,” says Himeise. “It was easy. Just took a couple of clicks.”

The power failed in DNA’s building twice during the production of The Ant Bully. “One time, we could shut down safely,” says Himeise, “but the other time the system went down hard. With a cluster this size, it was a nerve-racking experience, but the system came back up with no problem. We didn’t lose a file.”

Even with the new system in place, by the end of production the studio began running out of storage space.

“Isilon delivered 15TB of storage to get us through the final months and when we were done, we shipped them back,” says Himeise.

Isilon’s IQ series of storage arrays uses Gigabit Ethernet for front-end connections and can be configured with either Gigabit Ethernet or InfiniBand connections for intra-cluster communications. The storage nodes (IQ 1920, 3000, 4800, and 6000) are available in a variety of models, enabling users to meet capacity/performance requirements. The OneFS distributed file system creates a single, shared global namespace and supports NFS and CIFS. Isilon’s SyncIQ replication software distributes data between clusters.

Small shop, big jobs

Cutting-edge technology developed by two-time technical Academy Award winner Jim Hourihan helps Tweak Films, a San Francisco-based visual effects studio Hourihan recently co-founded, compete with larger, well-established studios.

Some small-effects studios survive on scraps thrown to them by the major studios-easy wire-removal shots, paint “fix-its,” etc. Not Tweak. This studio gets the hard shots: water simulations, rigid body simulations, fire, and smoke.

For example, Tweak Films created a tidal wave that surged through the streets of New York in the film The Day After Tomorrow, which won the Visual Effects Society’s award for “Best Single Visual Effect of the Year.” The studio also crashed military tanks on an aircraft carrier deck for a sequence in xXx: State of the Union, helped destroy Barad-Dur in The Lord of the Rings: The Return of the King, and, more recently, worked on water simulation shots for Superman Returns and Monster House.

Tweak Films, which has done work on films such as The Day After Tomorrow, uses Apple’s Xserve RAID array for most of its storage needs.
Click here to enlarge image

Thus, even though the studio is small, the shots are big. Simulating nature takes huge amounts of data and processing power. And that means the small studio needs smart storage solutions. “When you’re dealing with images, the data adds up really fast,” says Mike Root, a compositing supervisor and software engineer at Tweak Films, “but when you’re a small shop, you can’t buy massive network bandwidth.” A standard film frame, he explains, is about 12MB; the 5464 x 4096 resolution IMAX films, rendered with 10 bits per pixel rather than 8 bits, requires approximately 100MB per frame.

For centralized storage, Tweak uses an Apple Xserve RAID server with 5TB of capacity. “We also have a hard drive on each render node and on the desktop machines,” Root says. “In some of the work we did for The Day After Tomorrow, for example, the textures and geometry added up to gigabytes of data. If we had 50 machines trying to suck from one server all at once, we would have had a giant bottleneck. So we synch render data to all of our render nodes.”

Each of the Linux-based render nodes has a processor, 4GB of memory, and an 80GB or 160GB hard drive. The goal is to have each render node access its local drive rather than access data on the server.

“Say we have a 5GB data set of texture maps for rendering New York City that all the rendering nodes need to access,” explains Root. “Rather than having all 50 machines try to suck that data all at once, each machine gets its own copy.” That speeds the rendering process. It also means the data that rendering nodes access isn’t “precious;” it’s only a copy.

Root uses Rsync, an open source utility, to manage the file transfers. “Rsync checks on the server and local drives of the render nodes,” he says. “During the process of rendering, it picks up local information off the local drives. If anything has changed, it copies and moves only the changed part.”

For distributing the render jobs, the studio uses Condor, a queuing system developed for academic and scientific computing at the University of Michigan. “It gives us fine-grain controls for selecting which machines to run on,” Root says. “When we have a big job to render, we turn all the desktop machines into render machines as well.”

Eventually, Tweak Films plans to move to a SAN, hooking multiple servers to more Xserve RAID arrays. “Then, our render nodes and desktop machines would all talk to the servers,” Root explains. “We’ll still have the same philosophy: Rather than having all of our machines talk to one server, we’d have one server group for render nodes, another for our desktops, and so forth, and all those servers would talk to the same data storage on RAID with extremely high bandwidth. We’ll still be as efficient as we can.”

Apple’s Xserve RAID arrays can include up to 14 Ultra ATA disk drives, and Fibre Channel external connections, for a total capacity of up to 7TB. Pricing is typically less than $2 per gigabyte.

Barbara Robertson is a freelance writer in northern California.

This article was originally published on September 01, 2006