If you're following along at home, you may have seen that we recently made the move to Pantheon hosting. Last week during our maintenance window, Joe and I worked through our migration checklist, and officially moved the site over to our new host. The process had a few hiccups, but we thought it would be interesting to take a look at what went into our migration process. Hopefully, sharing what went into our planning process, as well as what is in our pipeline for improvements now that we're on Pantheon, will help you if you ever find yourself facing a similar project.
Figuring out where to start
One of the most difficult parts of this project was figuring out where to start. Pantheon has several helpful guides to get you started on their platform. In fact, in my initial "proof of concept" migration it only took me about 90 minutes to get a version of our site running on Pantheon. And, truth be told, a lot of that time was spent reorganizing our git repository, importing a database backup file, and running rsync to copy our files over. Doing an initial proof of concept like this increased our confidence in the actual migration process, but it also brought up a checklist of things we would need to figure out to make the transition as smooth as possible.
Our old Linode infrastructure consisted of several servers running Varnish, Solr, Memcache, Apache, and Jenkins. We needed to see where those fit in the new system for us:
- Varnish: Moving to Pantheon we'd lose the ability to customize our Varnish configuration file. We ultimately found that our Varnish configuration customizations were either no longer needed, or we could replicate similar functionality by using the Context HTTP Headers module.
- Solr: Adding Solr to our Pantheon site was, for the most part, a simple click of the button. We've since discovered a few differences between the version of Solr we were running and what is provided by Pantheon but it looks like a few configuration tweaks on the Drupal side will be able to account for those changes.
- Memcache: Instead of Memcache, Pantheon supports Redis as a faster drop-in replacement for Drupal's database caching layer. Beyond tweaking a few lines of our settings.php file this was an easy win. In fact, some of our early performance tests seem to suggest that the site's performance improved in part due to this particular caching change. (Stay tuned for more details about that in a future blog post.)
- Apache: Pantheon also uses the Nginx web server instead of Apache. This meant that any additional customizations we had made in our
.htaccessfile also had to be accounted for using a different approach. While quite a few of the pieces that make up our overall technology stack changed during the migration there was very little change required on our part to get things working.
The part of the project that was the most time consuming actually had very little to do with Pantheon and more to do with our internal tools and processes. Our old infrastructure contained setup instructions and a Vagrantfile that anyone on the team could use to get a development environment that mirrored our infrastructure up and running relatively quickly. We've also been enthusiastic users of Tugboat.QA to automatically build a full site for each pull request pushed to GitHub. A Jenkins server is responsible for periodically creating database backups, copying and sanitizing the database from production to our QA and test sites, as well as doing the actual deployments of new code. Figuring out how to accommodate changes to our workflow, and where the differences (and preferences) with Pantheon came up took a bit of time. All in all, we made small modifications to our existing workflow to adopt the very similar Pantheon workflow. This allowed us to decommission several Jenkins jobs, overall reducing the amount of critical infrastructure code we need to maintain.
Making use of our new toolkit
Pantheon provides several really useful tools for working with sites on their platform. The site dashboard itself allowed me to enable our Solr and Redis servers, schedule regular backups for each environment (Dev, Test, Live), download drush aliases for our Pantheon sites, and merge code with just a couple of clicks.
Things really started to take off after installing Terminus. Terminus is a command line tool, like Drush, that enables interaction with sites hosted on Pantheon. We're only scratching the surface of what's possible with Terminus, but we've used it to do things like return connection information in order to copy our database from Linode to Pantheon (during testing and the actual final migration), and to clear caches and deploy code on various environments while trying to debug unrelated bugs that popped up last week.
Another tool Pantheon provides that we're just getting started with is Quicksilver. Quicksilver hooks allow us to configure our site to react to particular workflows (in Pantheon terminology). This enables us to do things like revert Features every time code is deployed or pushed to an environment. We'll also use these triggers to run our test suite to ensure that new functionality doesn't break the site in unexpected ways. So far we've been able to replicate much of our old infrastructure with a much smaller number of Jenkins jobs, and fewer moving pieces ultimately means less maintenance work for our small team.
While there are still a few rough edges we're working on smoothing over, we haven't even really had time to take advantage of everything Pantheon provides. Just via the dashboard alone we have optimizations to make based on suggestions from the integrated Site Audit tool, watchdog errors to clean up to reduce the size of our error logs, caching optimizations that will speed things up for users all across the site, as well as New Relic metrics which will allow us to profile our site in ways that weren't previously possible. We're also excited to give Kalabox a try, as a replacement for our Vagrant setup for simple localhost development environments.
Bugs and Gotchas
Like any migration project, things didn't go perfectly smoothly for us. Here's a brief look at some of the hang-ups we encountered along the way, and the solutions we found.
When we originally considered the possibility of moving to Pantheon, they required a very particular configuration for the site's git repository. Specifically, the root of the repository also had to coincide with Drupal's root. The repository for our site contains our test suite, some helper scripts, patch files, and other miscellaneous documents. Thankfully we didn't have to figure out how to split off all of the non-Drupal material in our repository. Pantheon recently rolled out support for using a nested docroot. All that we needed to do to solve this problem was add a symlink. This meant that importing our code into our new Pantheon account was simply a matter of adding a
pantheon.yml file, and a symlink from a directory called
web to the
docroot directory containing Drupal.
We were definitely thankful for the addition of this nested docroot feature, but it's likely to cause some additional work and testing for us for a while. Pantheon provides their own version of Drupal core which we've added to our repository as an upstream remote. Any time we pull in changes from this upstream, we will have to manually move the files to our nested docroot. While it's not a difficult task it means that we're unable to use the dashboard button to automatically upgrade. We also ran into a small hiccup, where the location of a certificate file used to communicate with the Solr server was hard coded into a module. This caused problems for us when we tried to launch the Solr service. Their support team quickly diagnosed the issue and pointed us towards a fix. Within about a day their platform team even had a pull request ready to incorporate into the upstream repository. Not only was the response time great, but seeing progress towards a solution happening in the open on GitHub gave us a lot of confidence.
During testing and QA of the new Pantheon environment we also found a small bug. Pantheon as a platform has limitations on storing large files within the Drupal file system. Even with our previous host we have been storing our video files on Amazon's S3 service. We use the Filefield Sources module so that we can reference files in S3 on the video nodes on our site. On our previous host we had the max upload size set quite high, to account for a bug in filefield sources. During the upload validation process the max file size value is checked even when the file is being stored remotely. In our case, especially since we transfer videos to S3 manually and don't rely on uploading through Drupal forms, this caused issues with our video asset production process. Fortunately there was a very simple work around that allowed us to solve the problem, unsetting the validation function that checks for file size if the upload location is remote.
You may have also experienced some frustrating behavior from the site last week, where periodically most of the links on the page were unexpectedly redirecting to one of our tutorial listing pages. Changes in the configuration of our caching infrastructure allowed a race condition bug to slip through our testing. The bug already existed on our site but hadn't reared its ugly head, and was unrelated to Pantheon, but this illustrates the importance and difficulty of fully testing a migration like this.
Now that we're getting settled into our new home, we are looking forward to continue optimizing, tweaking, and improving our site. With the help of the tools Pantheon provides, we will continue to work towards our goal of continuous deployment.