In a recent article, I discussed internal array tiering and server-side tiering. In this article I’ll concentrate on an external tiered infrastructure. Although some vendors push a two-tier architecture, they are only referring to a two-tier production storage system, not tiering across data lifecycles. That looks more like this:
- Tier 0/1: Flash tiers in a hybrid or all-flash array. Use cases: OLTP databases, online customer orders, incoming machine data. Caching may exist on Tier 0, or the device may enable automated flash tiering based on access patterns.
- Tier 2: SATA disk in a hybrid array, an HDD storage system, active cloud storage tiers, or tape in a hig- performance tape library. Use cases: Office application data, CRM, big data for analysis, marketing media.
- Tier 3: On-premise tape or warm cloud storage. Use cases: Active archives, backup less than a month old, broadcast files, surveillance files.
- Tier 4: Off-site tape or cloud-based cold storage. Use cases: Archives, backup, aging files subject to regulations and governance.
Storage Tiering Automation Solutions
Moving the data to less expensive systems is the obvious solution—and a good one. What’s not so obvious is how IT can protect its admins’ time given never-ending data creation and fast-growing storage requirements. Automation is helpful when you can get it but there is no single solution that automatically tiers aging data from creation to cold storage.
The lesson is to automate as much as possible. And some storage tiering products can tier between homogenous arrays, external storage systems and native cloud tiering:
- Hitachi Dynamic Tiering creates virtual volumes for automated storage tiering. The really interesting part is that the tiers can be external storage—on-premise or the cloud—as well as internal. Qualifying storage includes systems running Hitachi Content Platform, or cloud platforms Amazon S3, Microsoft Azure or Hitachi Cloud Services.
- Microsoft Azure StorSimple enables automated storage tiering. StorSimple tiering can be a mite confusing because Microsoft calls primary volumes “tiered volumes.” However, Microsoft automates data tiering between internal tiers on the StorSimple storage device (PBBA or VA) and Azure.
- Avere has the ability to tier to any NAS or JBOD array. IT sets policies for movement and can scale the storage tiering target as needed.
- EMC added Federated Tiered Storage (FTS) to Enginuity. This enables Symmetrix VMAX 20K and VMAX 40K arrays to move and copy data to virtualized storage pools created from LUNs on the external storage system.
Note that automated storage tiering is not the same thing as policy-driven replication or automated backup. Both of these processes copy data; automated storage tiering moves qualifying data to less expensive storage tiers as opposed to copying it.
Automated storage tiering takes a financial investment because you usually must buy a new storage system to get it. However, if you need a storage refresh anyway, it can be worth the price. Performance on production systems may improve as your system consistently moves data off the expensive storage. And you save money on OPEX by storing data on lower-cost media and extending the life of the expensive production storage.
Another benefit is test/dev. When you have at least a two-tier architecture on-premise, you can use Tier 2 HDD systems for test/dev without impacting production performance. The same principle applies to big data analysis. A clearly constructed tiered infrastructure will also be helpful to data governance and security.
Defining Data for Storage Tiering
Let’s look at the data characteristics that drive tiering decisions:
- Access activity. Array-based dynamic storage tiering functions read metadata and access patterns, then move data accordingly between Tier 0 and Tier 1 and possibly Tier 2 as well, depending on the array’s architecture. External tiering works the same way, and tiers may be physical or virtual depending on the specific automated tiering product.
- Workload size. The size of the tiered data will affect storage tiering decisions. Smaller data movement allows more sensitive data placement and has less impact on storage media, which must have the capacity to store a large single dataset. So while a NAS tiering operation can tier once a day or more on a file basis, an OLTP database may tier 4KB blocks several times a day.
- Data priority. Prioritizing data is fundamental to intelligent storage tiering. Tiering solutions’ most basic function is to identify data based on age, but this is certainly not the only characteristic to use. Business priority and governance also impact data movement policies. For example, seconds-old data in a customer transaction system clearly has high business value. Once the order is fulfilled, that immediate value diminishes, and within a few days or week, the data can move to a nearline HDD array and from there to tape or the cloud. In contrast, big data for business analysis will not move to tape or cold storage for a long time. Automatic tiering will move it from production storage to an array that provides sufficient capacity and performance for ongoing analytics. A third major determining factor is risk: how fast can IT locate and recover relevant data during a litigation or audit, and is that same data subject to multiple matters? In this case, IT may choose to keep potentially relevant data on-premise, even though the data may be months or years old.
In the face of complexity and added expense, why would IT bother with storage tiering?
They bother because stored data follows the old 80/20 rule: give or take a few percentage points, 20 percent of data is accessed 80 percent of the time. The other 80 percent ages out within a few weeks. IT can’t simply get rid of that data, not for a long time (if ever). But this data eats up capacity and energy on those expensive production systems. Ultimately storage tiering frees up resources, avoiding extra CAPEX and lowering OPEX over time.
Photo courtesy of Shutterstock.