Docker has largely emerged from its pure test and development roots, and the container technology it popularized is now entering into the mainstream. But many storage managers have been struggling with the challenges of dealing with Docker’s unusual nature.

Here are ten things storage managers need to know about Docker and other container technologies.

1. Why has Docker/containerization has garnered so many advocates of late?
The simple fact is that Docker makes life a lot easier for IT. Those developing applications gain the ability to bypass much of the labor involved in application development and deployment by using pre-packaged elements within containers to which they then add code specific to their needs. So containers are being used by more and more coders, and there is no avoiding them in the more agile world that is evolving.

“Containers make software easier to develop, test and deploy,” said Jonathan Nicklin, founder and CEO, Blockbridge Networks.

2. Why is VM storage so different from container storage?
Virtual Machines (VMs) didn’t have a problem running persistent applications because the local disk space was encapsulated by a VM construct. VMs were intended to be eternal – which is a big plus for storage management, but that is part of the reason for phenomena such as VM sprawl. Containers, on the other hand, are designed to be ephemeral. They aren’t burdened by a heavyweight abstraction like a VM.

“This makes them more desirable in many ways, but one tradeoff is that a container cannot take its data with it when it needs to be restarted, or moved to another node,” said Josh Bernstein, vice president of technology in the emerging technologies division at EMC.

3. What are some of the issues inherent in using storage with containers?
While containers have advanced IT in many respects, storage is one of the areas that has suffered somewhat due to their introduction. How? The advances of the past decade or so in storage management have in some ways been lost due to containers bypassing the normal relationships between storage and applications.

“Containers have reintroduced traditional storage management problems including provisioning, mobility, redundancy, security and backup,” said Nicklin.

4. If containers are deployed, how does this impact storage scalability?
Storage infrastructure scalability has been one of challenges standing in the way of deploying containers in production. The good news is that this is becoming less of an issue these days.

“There are Docker volume drivers from EMC and other enterprise-scale store vendors that are finally jumping on the Docker bandwagon,” said Eric Sites, CTO at Meros, a company that provides management, monitoring, logs and alerts for Docker.

5. Is it true that Docker and other container technologies require only direct attached storage (DAS)?
This was the case in earlier versions of Docker but is not the case any longer. However, some teething troubles in implementing storage are to be expected.

“The volume driver helps but I still think it will take some more tweaking of the new API to make it really usable for every situation,” said Sites.

6. How do these limitations impact cost once you move containerization beyond app development into the broader storage and data center arena?
For the moment, the maintenance and configuration costs for containerization are relatively high. Progress has been made on container storage, but there is still plenty of work to be done to bring containers fully into the storage fold in areas such as volume orchestration and provisioning.

“We need enterprise production level tools, GUIs, self-provisioning with limits, and controls to bring down the maintenance and operational costs,” said Sites.

7. What can storage managers do to better understand the problems inherent with container storage?
Nicklin said you need to break down storage with containers into two categories: layer storage and data volumes. Layer storage is used during creation and operation of a container. Any modifications are lost when a container is destroyed. In effect, layer storage is transient. The primary challenge in this space is to provide fast, efficient thin-provisioning functionality.

“In many instances, Docker is configured to use Logical Volume Manager (LVM) in loopback mode,” said Nicklin. “The pain points with this approach include the inherent inefficiencies of layering blocks on top of files and the inability of LVM to efficiently reclaim resources as layers are deleted.”

8. What problem does the transient nature of layer storage introduce for containerized applications?
The most obvious challenge is that it is difficult to containerize applications that maintain state. For example, consider a database. If you want to upgrade the software in a container, you are required to deploy a new container. If the database is kept on layer storage, the new container will not have access to it.

“When you delete the old container, the database is deleted as well,” said Nicklin.

To address this, Docker introduced the data volume concept, as well as volume drivers. The benefit provided by volume drivers is to allow third parties to provide and manage data volumes. Nicklin said that Blockbridge, Flocker and Rex-Ray are some of the first volume plugins that have come onto the market for containers.

9. Once you have successfully containerized an application or two, how should you start thinking about production deployment?
At that point, it is probably wise to consider how to schedule resources and handle failures. From an operational perspective, containers must be able to run without hardware dependencies in order to provide high availability.

“For stateful applications, this requires persistent volumes to be accessible from any host that may run the container,” said Nicklin. “The Docker APIs have evolved to push more responsibility to the driver ecosystem. But even with a better API contract, it can be difficult due to a lack of control plane features required to provide a resilient solution.”

The storage stack, therefore, has to become multi-host aware. Unless you are using a network attached file system, the infrastructure must coordinate and provide mutually exclusive access to shared resources.

10. What should users know in terms of best practices or lessons learned?
Nicklin summarized some of the best practices as follows:

  • For layer storage, avoid “block on file” inefficiencies by either using LVM on a raw device or moving to a purpose-built file system.
  • Leverage data volumes to isolate application state from the application container, and at the same time, avoid host-based local storage in order to eliminate machine dependencies.
  • Maintain platform independence by keeping the base OS installation clean – this includes keeping drivers and host software to an absolute minimum. (If it’s not in a container, ask yourself why.)
  • Don’t leave security to the last minute, i.e., verify there are no embedded infrastructure passwords in containers or configuration files.
  • If you are deploying an enterprise application, have a sound strategy for encryption, secure deletion, backups and snapshots.
  • Consider how the container storage workflow fits with the existing bare metal and virtual machine infrastructure.
  • Avoid unnecessary clustering technologies or else add full-time staff to support them.
  • Introduce random failure testing – the only way to be prepared for failures is to regularly test.