Guidelines for implementing cloud storage

Whether you're considering an internal or external cloud, beware of the potential pitfalls and plan accordingly.

By Steven Pon

-- In today's challenging business environment, it's critical for IT professionals to take a larger view of their data centers than in the past. IT functions such as network and storage that were traditionally siloed within the organization, must now be more closely aligned due to convergence. This encourages increased collaboration between functions, which provides tremendous benefits to IT organizations when performance enhancement and cost cutting are the goal. Many companies are now finding that the solution is in reaching for the cloud.

SANs dedicated only to storage are expensive and single-purposed.  With the faster speed of networks today, greater opportunities for convergence exist because storage can be accommodated by the data center backbone network to create a single conduit for data transmission.  Internal storage clouds over IP networks via iSCSI or Fibre Channel over Ethernet (FCoE) are becoming more attractive from a flexibility perspective.  The traditional network connectivity will support existing requirements for IP traffic (e.g.,. iSCSI, NFS or CIFS, as well as FCoE).

FCoE is designed to enable SAN expansion to the enterprise data center. Many data centers use Ethernet for TCP/IP networks and Fibre Channel for storage area networks (SANs). With FCoE, Fibre Channel becomes another network protocol running on Ethernet, alongside traditional IP traffic. FCoE operates directly above Ethernet in the network protocol stack, in contrast to iSCSI which runs on top of TCP and IP. As a consequence, FCoE is not routable at the IP layer, and will not work across routed IP networks.

The new data center network traffic will consist of multiple protocols, including the traditional TCP/IP and now FCoE, while continuing to support the conventional outward-facing network traffic for messaging and data communications.

Organizations concerned about storage budgets need to be prepared to discuss the cloud. 

Cloud storage

So, what is cloud storage? Best described as a utility, the cloud consists of a variety of behind-the-scenes technologies – scalable and redundant servers, controllers, storage and software – that are already in use and combines them to create a "cloud" of storage.

With this model, it is possible to take various disk pools from multiple geographically dispersed sites, carve them up into logical partitions of storage, replicate these partitions (locally or geographically) to one another and present the combined storage securely to each group via a global namespace accessible over the network.

These disk pools should have features such as data deduplication, thin provisioning, high scalability, data protection, and simple management. These aggregated features, combined with high-bandwidth networks, are making cloud storage a reality today.

As data continues to grow exponentially, IT budgets remain flat or are decreasing, posing a tremendous challenge for companies as they struggle to manage data growth and keep costs down. This is where cloud storage can help.

Approximately 60% to 80% of data within a company's storage environment is classified as old, stale data that should be residing on Tier 3-6 storage.

Tier 1 – Enterprise, high speed
Tier 2 – Enterprise, moderate speed
Tier 3 – SAN/NAS (modular, high speed)
Tier 4 – SAN/NAS (modular, low speed)
Tier 5 – CAS (archive compliance)
Tier 6 – VTL

Within most data centers, there is a considerable amount of unstructured and archive data that is ideally suited for the cloud. 

Internal vs. external clouds

There are two general varieties of cloud storage: internal and external. External cloud is characterized as being hosted by a third-party provider. Internal clouds have many of the same characteristics as external clouds but are owned and operated within an individual organization.

In differentiating between choosing external or internal cloud storage, network latency must be considered. An external cloud relies on the Internet and is therefore only recommended for Tiers 4-6 data. Internal clouds, however, could be used for Tier 3 or below data depending on the bandwidth of the local Intranet.

For most IT organizations, scalability is a significant factor in determining the capabilities of internal cloud storage. Proponents tout the easy expansion and near limitless growth potential, fault tolerance, dynamic expansion, and plans for capacity growth still need to be well thought out. The cloud is not a panacea for past years of unchecked growth; it is a commodity that needs to be managed effectively.

In addition, key performance metrics need to be defined and understood prior to implementation. Often the assumption is that inactive data requires only a fraction of the performance that would be afforded to higher-tier storage. However, parallel initiatives such as e-discovery may dictate otherwise.

Response time and the end-user experience also have to be aligned. A service level agreement (SLA) or operating level agreement (OLA) should be established and the cloud solution should be capable of meeting these requirements. Since the data is being accessed via the intranet, a baseline of the current network capacities and capabilities should be undertaken prior to any implementation in order to understand any latency ramifications. Best practice guides for the particular internal cloud solution regarding issues such as network isolation, data replication, and geographical proximity all need to be factored into the overall solution.

As with any new technology, the simplicity of management and relative ease to implement needs to be evaluated. The onus of bringing this new technology into an environment usually falls upon already taxed existing staff.

Identifying what roles and responsibilities will be needed, how they will be shared, and how much time each will require are not pure technological concerns but, rathe,r a thought process that contributes to the overall health of an existing storage staff.

Determining if cloud storage will be a viable alternative for an organization requires careful consideration. First, the data center data has to be analyzed to determine how much of it could be moved to the cloud. Then, IT has to determine how the data would be moved to the cloud. The initial population could be as simple as a copy; other data might need some sort of a data mover that moves that the data based on a pre-determined policy.

Another important consideration is economic viability. An external cloud may be a good fit for companies looking at the cost of existing storage and aversion to risk, while other organizations will look at the costs and risks of external clouds and choose to implement an internal cloud.

Since an external cloud is essentially storage as a service, costs fluctuate depending on the volume of data stored and the length of the contract term. Internal clouds, on the other hand, have all of the facets of total cost of ownership that most in-house storage will incur.

Features and functions

Cloud storage can consist of just storage or computing and storage. In the world of cloud computing, there are many more features and functions as both the servers and storage can be virtualized and connected using many different protocols. In the terms of more pure cloud storage, the storage is generally connected to Ethernet and accessed via HTTP(S), CIFS/NFS, FCIP/iSCSI or FCoE protocols. It can also be presented in block-based storage where another tool is then layered on top of this to present file systems from the block-based storage.

Storage features such as thin provisioning, automated data re-tiering, data deduplication, automated replication and a locally or geographically dispersed solution are all becoming more popular.

Potential pitfalls of the cloud include:

• Very large files
• Large numbers of very small files
• Files that are accessed at a frequency that extends what their connectivity protocol can support  
• External cloud providers can sometimes turn off read-ahead cache engines on their storage, making end-user data retrieval slow

Benefits and concerns

Benefits of cloud storage include increased collaboration between IT function areas such as storage and network and the self-service network access to a pool of storage. This storage should have policy-driven automation that includes quick and easy provisioning and de-allocation as well as automated replication. External cloud concerns include data availability, subscription terms, lack of location control, no physical access to data, additional on-site components required to support the solution, additional external bandwidth, the need to compress and/or encrypt the data, and possible compliance risks.

Internal clouds generally have fewer concerns, but users should still be aware of security, management, alerting, updates and refreshes to hardware and software, and up-front capital expenses. Most of these concerns already exist in day-to-day management of in-house storage and are not new concerns.

Cloud storage offers the ability for users to quickly add storage to their environments, with less interaction from storage administrators. Having the data move from within a platform or platform-to-platform and site-to-site can be automated, further lessening the strains on the storage administrator.

Users implementing cloud storage for a fixed tier, such as for backup or archival purposes, will likely just set policies within their existing tools to move data to the cloud storage. For users that are using the cloud as a tool to quickly add capacity due to a seasonal need, or those who need to create a copy of their data for a dev/test environment, the policies will involve freeing the space that was used as a temporary placeholder and backing up or archiving this data if necessary.

Finally, for users who manage their own cloud storage, there are some solutions that will allow for the creation of automated policies. Examples might include provisioning or de-allocating storage.

The five-phase adoption of cloud services involves:

1. Centralize management of IT in order to gain economies of scale, and to gain visibility into costs and take control of the IT offering.

2. Standardize the cloud offering based on the key business requirements. Attempting to support custom solutions for each application puts a burden on resources. Consistency is the key to improving quality, provisioning times, and reducing support costs and risk. Standardization is a prerequisite for successful consolidation and automation.   

3. Virtualize and consolidate the physical infrastructure. Virtualization and consolidation increase asset utilization and storage efficiency. Virtualization can be implemented at each level of the infrastructure stack; unified storage, unified fabric, virtual servers, increasing asset utilization and simplifying asset lifecycle management through mobility of applications and data. Pooled resources enable much faster time to market and considerably lower overall costs.

4. Automate the environment. Once the cloud services and processes are standardized and the infrastructure is virtualized, the incorporation of automation is possible. Automation tools increase abstraction, providing simplified and efficient controls for overall workflow management.

5. Delegate control for self-service and APIs. Transference of control to the organization/user is evidence of a successfully deployed cloud service model. Allowing application administrators and owners the flexibility of scaling on demand, choosing different levels of performance and data protection as mandated by their organization, and automating recovery from application errors are all possible through application integration and self-service capabilities. This significantly lessens the administrator's responsibilities. Maximizing cloud services requires partnering across the organization to help move through each of these phases.

STEVEN PON is a storage solution architect at Forsythe. His areas of expertise include large-scale data protection and recovery management solutions with various types of SAN and NAS systems.

Related articles:
How real is cloud computing/storage?
Double-Take partners with Amazon on cloud DR
Cloud Leverage offers cloud storage for a nickel per GB

This article was originally published on February 26, 2010