By Tony Hobart
November 23, 2010 -- The most important skill for today’s storage administrator or engineer is balance. That’s because all storage management is a balancing act.
It’s not managing capacity. Many organizations wind up with underperforming SANs even though they have a significant amount of unused capacity. It is not fan-in. It is not the number of or type of spindles.
The primary determinant of success for a storage administrator is managing resources to keep capacity and utilization in balance, thus reducing (and keeping) bottlenecks to a minimum.
Why balance is important
Let’s start with the basics. A storage administrator needs to manage shared resources: cache, CPU, front-end ports, back-end ports, etc. At some point administrators realize that they can’t just be managing these resources for a single disk; they need to consider the whole array. That’s a much more complex task. Storage administrators need to manage around the so-called “70% cliff.” Below 70% utilization, storage performance works fine. At 71%, performance may become terrible. Therefore, the one thing a storage administrator can do to stay off the unemployment line is to always balance resources to keep each disk below 70% utilization.
The biggest impediment to meeting this challenge is that storage administrators simply don’t know when they’re approaching that limit. Your systems show no visible signs of whether or not they’re getting close to having a problem.
At 62% or 65% the system runs fine with no noticeable trouble. However, even the most innocent task can send a well-performing system into a tailspin.
Consider a system that is at the 65% point today. Now imagine one DBA somewhere in the organization decides to do one seemingly small and isolated task—creating a copy of a database on the SAN. All of a sudden, that one, small, routine and isolated task has thrown the company’s storage into disarray. (As an aside, how many times have we seen this happen at 3 or 4 in the morning, and the storage administrator gets called out of bed?)
Many firms today don’t realize the risks they face. They wait to upgrade until they are close to (or at) that dangerous 70% level.
The danger with upgrades
Now consider what happens when the firm that is keeping utilization low decides to upgrade. One of the steps is to take down half the storage capacity. A firm that was running fine at 50% utilization takes out half its storage. Now they’re at 100%. This becomes a big problem. Many firms specify “live” upgrades, but have no significant remediation program. Again, the key question is: How do you know where you are in terms of utilization?
Finally, let’s go back to our DBA. Even if a company is somehow managing capacity, it can still wind up with problems. Our DBA does a database copy from a fast spindle to a slow spindle. We now have a situation where the target disk cannot keep up with the origination, or the overall SAN performance degrades in an attempt to keep up with the slower drive.
This brings us back to the fundamental issue: It’s not capacity, it’s balance. Do you have any way to monitor resources? Laurus Technologies’ research indicates that 80% of firms have no way to monitor utilization on a routine basis. They could have a serious problem and are completely unaware of it.
First, you must have storage resource monitoring tools. In order to keep your storage in balance, you need to regularly monitor several important aspects of storage utilization, and you’ll need tools that give you that access. Sometimes the tools are provided by the manufacturer, but often storage administrators resort to looking through history files in Microsoft Excel for any abnormal data patterns. Many firms simply don’t have the tools they need.
In addition, it is important to understand the tools you have at your disposal. The tools you use will depend on the vendor of your storage systems. Some vendors include a robust set of tools, while others have nothing at all, in which case you’re expected to print out raw data and enter it into a spreadsheet or graph.
Having no tools makes your storage utilization invisible to you – until there’s a problem—as your usage eats through general storage resources. Storage training typically ignores the day-to-day usage management. Understanding the management tools becomes a self-taught effort.
Take the time to explore the commands and capabilities of the tools that came with your storage system. Understand your I/O patterns and contending loads over time. Don’t put the VM server and SQL server on the same spindle; you don’t want to put general busy loads on the same set of disks.
You should develop a dashboard with three basic measurements: millisecond response time, disk utilization, and spare capacity. To administer the SAN, you need to measure on a weekly, monthly, bi-monthly, and annual basis in order to manage loads.
No tool is useful if you only look at real-time data. You need to keep track of historical data. It is also critical that the data be measured over your firm’s normal business cycle. For most firms, it is a minimum of 30 days worth of data, which will include the normal end-of-month reporting. Some companies have detailed quarterly reporting. If you are in a business such as retail, you have a much busier fourth quarter due to the traditional November shopping period. Regardless of the business cycle applicable to your business, you need to make sure you cover the normal business cycle for your industry.
Nobody buys SANs for non-critical data. A SAN is installed because the business needs it to be working all the time and accessible across the organization. If it is not, then your job turns into crisis management. It is imperative to do what you can to make SAN management easier. The first step is to get organized for proper SAN management.
The good news is that the industry is heading toward improved management, with functionality such as auto-metering and auto-tiering. However, it will still take some time for all storage systems to be upgraded to include these capabilities. Until then, developing proactive processes and procedures to balance storage resources is the most important goal.
Tony Hobart is a systems architect specializing in storage at Laurus Technologies, an IT systems integration and professional services firm based in suburban Chicago. Tony can be reached at firstname.lastname@example.org.