Docker often seems like an impenetrable product. Is it a VM system? A suite of development tools? A clustering product? A software distribution facility? When the answer is "yes" to each of these, it only becomes more confusing. For the Drupal developer, Docker is a way to provide a local development environment to run web server software.
In this tutorial, we'll:
- Define the terms hypervisor, virtual machine (VM), and containers
- List the advantages of containers over VMs
- List the advantages of Docker for Drupal developers
Understand what Docker is, the advantages of containers, and how Docker can be a useful tool for Drupal developers.
At its core, Docker is a containerization system including a lot of useful tools. It allows you to run applications anywhere, regardless of how those applications were created or the host system on which they're running. This includes the software that makes up a typical web stack such as Apache, MySQL, and PHP.
"That's great," you might say, "but why does that matter?" As long as Drupal is running on a supported web stack, what difference does it make?
The "It works on my system" problem
When working within a team, often a problem will consistently appear for one developer, but cannot be replicated on any other system. This is often due to small configuration or server software version differences, making it difficult to find and debug.
In the worst case, this happens when deploying the site to live infrastructure. Differences in server software versions, configuration, resource availability, or the underlying operating system (OS) that runs the server can create unexpected and costly downtime.
The solution is to try to mimic the production environment as closely as possible. But how?
If you only need to support one project on your workstation, this isn't too difficult a problem. Once you get the production configuration files, you need to download and install the specific versions for each piece of server software. This is a laborious process, but it can be effective...
...for a while. As soon as you need to support multiple projects with different software requirements it becomes difficult to manage. It could be as simple as a difference in configuration, but it could be as complex as needing to uninstall and reinstall different versions of server software.
All-in-one web development environments: not ideal
To solve this problem, all-in-one web development environments have been created. Products like MAMP, WAMP, and Acquia Dev Desktop provide a user-friendly interface to create multiple site configurations on one system. Each may support different configurations and even different versions of server software. While this sounds like the perfect solution, they come with several problems.
Apache, PHP, and MySQL were never quite intended to run alongside several versions of the same software on the same system. In a data center, if you needed PHP 5.6 for one site and PHP 7 for another site, you would have separate servers for each. That's not an option when running on your laptop or local workstation. All-in-one development environments often resort to trickery in order to run multiple versions and configurations side-by-side. This can make the configuration prone to breakage over time.
Furthermore, there is limited isolation between different sites on the same system. If you try to change a PHP setting for one site, you may affect multiple sites later on, requiring hours of debugging. This problem is magnified when running multiple versions of the same software on the same system.
Cleanup is yet another pain point. Many of these all-in-one products do not track every file associated with a site. When you delete one in the UI, dangling configuration files, global server changes, and databases may not be completely removed.
Virtual machines: advantages
All of these problems led many to an alternative to all-in-one development environments -- virtual machines (VMs). VMs have been around for decades. You can imagine it like a flight simulator, only for computer hardware. In a flight simulator, you are given a simulation of all the indicators, dials, and controls necessary to pilot an aircraft. Likewise, a virtual machine provides a virtual CPU, an allotment of memory, and a virtual set of hardware devices. Instead of a pilot, a VM provides a foundation in which to run a whole other operating system -- typically Linux -- on top of your laptop's OS such as macOS or Windows.
VMs aren't new. They've been around for decades and are a proven technology which brings several advantages to the Drupal developer:
- Standardization. A VM can be created with the exact server software and versions necessary or replicate the production environment, right down to the operating system required.
- Sharing. A VM snapshot can be taken when a desired configuration is created, and then shared with everyone on your team. This reduces the likelihood of unrepeatable problems.
- Sandboxing. You can have a different VM for each product you develop, each with a different set of server software and configurations. All files in the VM are tracked in a virtual disk, and can be easily deleted when no longer needed.
Virtual machines: problems
Virtual machines sound great, but they too have a big drawback: resource consumption. Running another operating system requires a non-trivial amount of CPU, memory, and disk overhead. Even when not running, the virtual disk remains on your hard drive, consuming gigabytes of space.
Sandboxing is a double-edged sword. While it provides isolation between your projects, it creates a duplication problem. If you create a VM for each project you work on, some projects will rely on the same OS, even the same server software. Most free and open source hypervisors -- the software that runs VMs -- do not de-duplicate this data, resulting in more consumed disk space.
For many, VMs are simply overkill. Indeed, they virtualize and isolate everything, but that is often more than what we need as Drupal developers.
Most hypervisors virtualize the hardware, but the vast majority of Drupal web sites run on some version of Linux. The only differences are in the versions and configuration of web server software. What if instead of virtualizing the hardware to run a particular OS, we could virtualize the OS and bring all of a VM's advantages to running applications?
This is what containers do. Containers are a kind of operating system virtualization. They allow you to run applications such as Apache, PHP, and MySQL in a standardized, shareable, and sandboxed environment. Instead of needing to create an entirely new VM for each site, you only allocate the containers you need to support your project.
Containers have several big advantages over VMs:
- Speed. A set of containers can often be started and run in a couple of seconds, rather than a minute or more. This makes it fast to switch between projects with different sets of containers.
- Resource use. Since a container only needs to run a single application, they are smaller and require less disk space and memory compared to a heavy VM.
- De-duplication. Many container systems also de-duplicate files, saving you disk space.
Containers aren't new either. They've been a feature of many UNIX operating systems for years as a VM alternative. It wasn't until Docker that containers became popular with developers.
So what is Docker, anyways?
As mentioned at the beginning of the tutorial, Docker is a product that allows you to run containers, but it does more than that, as follows:
- Distribution. Docker provides an easily accessible and free-to-use method to share your containers not only with your team, but the entire world.
- Development tools. Docker includes all the tools necessary to build your own containers out of the box. Once installed, all you need is a text editor and the command line.
- Simplified commands. Many container systems are tricky to set up and use, but Docker provides a simplified model that can be leveraged with only with a few commands.
- Clustering. Docker includes workload balancing even in the free and open source version of the product. There's nothing that prevents you from using the same Docker runtime on your laptop as on your production web server.
- Standardization throughout the development life-cycle. With Docker on your web server, you can use the same containers -- with the same software and configuration -- as you did in development. This reduces launch surprises.
We've covered a lot in this tutorial. We've taken a brief walk through the history of virtualization and how it affects us as Drupal developers. We've learned about both hardware and operating system virtualization. Finally, we learned about Docker. We now know that Docker is a way to run containers -- sandboxed and portable applications -- combined with distribution, tools, and production use built right in.
Further your understanding
- How do competing products like Rocket and LXC differ from Docker?
- What use cases would be more appropriate for a VM than a container?
- How do BSD Jails, Solaris Zones, and Linux chroot differ from containers?
- Docker overview (docs.docker.com)