By Michele Hope
By now, most enterprise IT groups have undergone some form of server and storage consolidation aimed at correcting problems with backups, provisioning storage, or reducing storage management burdens.
In an effort to put a familiar face on this trend and learn what’s working and what’s not, we contacted a few IT managers who recently completed their own consolidation efforts. Interestingly, the two profiled companies both ended up with IP-based storage solutions.
Ending routine fire drills
When you manage IT systems at one of the largest employee benefits service providers in southern California, the last thing you want is to be directly or indirectly responsible for employee or system downtime. Yet, that’s the unenviable position TRI-AD IT director Stephen Greenlee found himself in prior to consolidating the company’s storage on an iSCSI SAN.
Running out of locally attached storage was an all-too-common occurrence on primary servers supporting the majority of TRI-AD’s e-mail, database, and file/print services. Backups of server data were another pain point-exacerbated by the use of multiple tape drives and tape formats, not to mention the fact that data entry occurred round the clock, providing IT with no viable backup window that wouldn’t impact employee productivity.
The situation, in Greenlee’s words, was “a mess.” The company’s Lotus Notes/Domino server was just one example of the problem. With only 30GB of storage attached to the server, TRI-AD used to experience at least a monthly fire drill requiring Greenlee and his team to scramble to free up more capacity. “It was a horrendous situation for us,” he says. “We had to routinely go around and tell people to archive their e-mail and get rid of attachments.”
Greenlee knew early on that he wanted to move to a SAN solution, preferably one that would allow him to take snapshots of any of the company’s live file systems, then back them up offline. Although the iSCSI protocol was still in its infancy, Greenlee was attracted to the familiar Ethernet functionality he might achieve with an iSCSI SAN, not to mention its purported lower cost compared to a Fibre Channel SAN.
During his initial research phase, Greenlee stumbled across a demo of an iSCSI-based EqualLogic Peer Storage array. He liked how the product plugged into the existing network, how easy it was to expand without requiring any “forklift upgrades,” and the fact that it came with advanced features built in (e.g., virtualization, snapshots, replication) at no additional cost.
Greenlee’s team has since installed a PS100E array, which comes in a 3U chassis and holds up to 24 Serial ATA (SATA) disk drives, for a total of 3TB of raw storage capacity. Of that, TRI-AD currently uses 1.6TB, 653GB of which houses data from primary servers while about 700GB is reserved for snapshots. To help get control of backups, the company now takes a daily snapshot of the server and does a full weekly backup that goes off-site. If TRI-AD needs to restore data, it can leave the server up and mount the snapshot to another server. Provisioning storage is also much less painful, requiring no system downtime.
In all, Greenlee estimates that consolidating to the PS100E-based iSCSI SAN has saved his team three to four hours a week they used to spend trying to fix storage problems, not to mention saving on lost productivity for the 30 to 40 employees who were often left waiting for an application due to storage problems. Reporting jobs, data dumps, and exports also now take only about a half hour instead of several hours to complete.
Curbing storage costs
When Jay Brummett joined Ogden City government as CTO in late 2002, his mandate from the mayor and city council seemed straightforward: to take the city’s IT systems to the next level. Judging by Ogden’s recent first-place win among cities of its size in the Center for Digital Government’s 2004 Digital Cities Survey, it appears Brummett was able to accomplish much of this mandate. But the recently completed IT transformation did not come without growing pains and back-end reconstruction-especially at the storage, server, and staffing levels.
As part of its effort to prepare for hosting key portions of the 2002 Olympic Games, Ogden City had already rapidly deployed a number of wholesale upgrades to its IT systems. These included a new 911 police records system that still required consolidation across approximately 17 jurisdictions. The city had also undergone large upgrades to its utility billing system and had recently selected an ERP vendor and a new public safety vendor.
Due in large part to the speed at which these IT projects had been deployed, Brummett found that he had inherited a sizable deficit in the city’s IT internal service fund. “We were upside down nearly $2 million on a net-net $3.5 million budget,” he said. “So, we had to look at doing some radical restructuring.”
Brummett’s first order of business was to reduce IT staff from 29 to about 14. His next step was to determine how they could cut down on the inordinate amount of time existing IT staff spent managing about 80 HP servers they had in operation, each with its own direct-attached storage (DAS). The server infrastructure consisted of a variety of aging servers, many with incompatible spindles and a mixture of drive models that had been swapped out over time with newer models. Ogden was also struggling with islands of storage that required the city to buy another server when one server ran out of room to add more drive canisters.
“It became readily apparent that we needed to do both a huge server consolidation and a huge storage consolidation along with it,” says Brummett. “We needed to do something about storage because we couldn’t continue to manage the amount of data or servers we needed to manage unless we made some radical changes.”
Brummett started looking for a solution that would allow Ogden to pool its storage and consolidate its servers, all while achieving immediate savings in the cost of human resource management and in the ability to lower the cost per megabyte of storage. Brummett’s team initially evaluated a number of SAN solutions from a variety of vendors. The final choice, however, was an IP-based SAN from LeftHand Networks.
The dramatic drop in cost per megabyte was key to Ogden’s choice. “[The IP SAN] allowed us to drop our cost per megabyte from somewhere around $0.80 to $1.25 to under $0.25,” says Brummett.
Ogden has since consolidated from 80 servers to about 50 servers and has migrated more than 1TB of data from DAS drives into the LeftHand IP SAN. The total amount of consolidated storage space in use on the SAN today is about 4.5TB, with new database projects and extra data-protection safeguards accounting for the increase in storage capacity.
Brummett’s consolidation move resulted in reductions in the backup windows and significant improvements in service levels. When viewed in conjunction with a handful of other initiatives undertaken to re-engineer the way the group delivered IT services, Brummett estimates that his team’s storage and server consolidation directly contributed to his ability to eliminate Ogden’s $2 million IT budget deficit in just under 20 months.
What did the consolidation mean for Brummett’s IT staff and operations? “We were able to maintain or increase service levels at the same time that we reduced headcount by almost half,” he reports. As a before-and-after example, Brummett recalls a failure to the Microsoft Exchange server shortly after he arrived on the scene. It took 36 hours and a bare-metal restore to correct the problem. After the consolidation, they had another failure with Microsoft Exchange. This time, they were able to recover the system in just about an hour.
“A lot of that had to do with the fact that we had our Exchange stores mounted on pooled storage on a SAN,” Brummett explains. The current “rapid-restore” process now involves restoring a prior drive image (snapshot) to a repaired server or a bare-metal server that is replacing the damaged server. This allows rapid access to volumes containing the Exchange stores and the backup Exchange stores.
Brummett estimates that about 90% of the city’s location-based and financials data is now stored on the SAN. He is currently exploring a few new storage tricks with his SAN that he thinks may allow Ogden to avoid writing costly APIs to share data between systems, such as the city’s extensive GIS database that would need to be accessible by public safety officials and a variety of related applications in the event of a disaster causing the primary data to go offline.
Some of the storage tricks currently involve ways to use the SAN’s replication functionality to mount read-only, replicated versions of the city’s GIS database that are updated (via synchronous replication over several miles) in real-time or near real-time. Other applications aware of the table layout in the GIS database can then access the data.
“It would have taken us many hours to deal with a disaster before,” says Brummett. “We would have pulled our tapes and [would then have] had to reload systems. Now, we can provide the real-time data access tools and have the ability to come up at a hot site in our public safety facility with the full analysis tools we need to maintain our business.”
Michele Hope is a freelance writer and owner of TheStorageWriter.com. She can be reached at firstname.lastname@example.org.