Our Approach to Data Migration for the Upgrade

In this post I'm going to give a little bit of insight in to how we are planning (and currently attempting) to migrate all the data from our current Drupal 6 site to our new Drupal 7 site using a couple of handy tools like Jenkins CI, bash shell, drush and the migrate module.

The Plan

The end goal for our D6 to D7 migration is to build an entirely new D7 site, re-architecting and re-thinking everything so that we're happy with it. We're doing this in such a way that the D7 site can be installed from scratch at any given time, and after running the data migrations you'll have the current working version of the new site with all of the latest D6 data. In the meantime the D6 site will continue to live on and get new videos added, collect comments from users, and track information about who is watching what videos. Then, on some mythical day in the future we'll install the D7 site from scratch on the new server and point it at the production D6 server as a source for the data migration. We'll turn the D6 site off, run a final migration, and turn the D7 site on.

In order for this to work without incurring a lot of downtime while the migrations run we're taking some steps now to run the migrations against the data we've got currently and to keep running them over, and over, and over, on a regular basis pulling in any new data that's been added to the production site over time. That way, when we're ready to make the switch all we have to migrate is any data that's been collected on the production site since our last migration. Much faster than doing the whole thing. Or at least that's the hope.

Tools and Workflow

At the outset we decided to use the popular migrate module in order to perform our migration. We are fortunate to have in-house expertise that we could tap into (Lullabot has used it to migrate sites like Martha Stewart and others). It allows us to perform continuous migration, so rather than having to wipe and re-build the whole site from scratch we can just update the things that have changed since last night, including things like edits made to a video description that had been migrated previously. It's pretty sweet, though can also be a bit confusing at times.

Jenkins is a continuous integration tool that provides a UI for creating and scheduling various jobs. A job is a set of tasks that you've asked the server to perform. For example, a task might be to checkout the latest code in X git branch to Y location. Jenkins can also monitor running jobs, report on failures, and lots of other handy things that make it easier to automate parts of a workflow on a server.

Here's what the basic workflow for running our migrations currently looks like.

  1. Log in to Jenkins.
  2. Click the build button.
  3. Get a cup of coffee and wait for a while and if it doesn't report any errors you're done.

Okay, it's a bit more complicated than that, but once we got everything in place that was the desired end result. I wanted to make it so other people on the team could easily run the migration without me being around because I like to be able to take vacation.

Jenkins commands

Here's what goes on behind the scenes when you click that build link in Jenkins. We're using a combination of drush commands and some bash scripts to walk through all of the steps we need, in the correct order.

# Put site in maintenance mode and log everyone out.
drush @drupalize.d7dev vset maintenance_mode 1 --yes;
drush @drupalize.d7dev vset maintenance_mode_message "Migration in progress. Hold on to your britches." --yes;
drush @drupalize.d7dev sqlq "TRUNCATE sessions;"
# Run our migration stuff.
ssh [email protected] "cd /var/www/d7.drupalize.me/scripts; ./prep-migration.sh -d 'd6_drupalize_me' -f /var/www/d7.drupalize.me/d6_files; ./run-migration.sh @drupalize.d7dev;"
# And take the site out of maintenance mode.
drush @drupalize.d7dev vset maintenance_mode 0 --yes;

bash scripts

One of the things I did when setting this up was to create a set of bash scripts that perform most of the heavy lifting rather than writing the commands right in to Jenkins itself. Why? There are a lot of good reasons to do this. Your scripts are in version control, you can make a change to the script and deploy it along with the relevant migration code at the same time without having to also make changes in Jenkins, and probably the most useful feature is that the scripts can be run locally without Jenkins for testing purposes, or just for developers that are new to the project and need to get up and running fast.

Since we're migrating data, the first thing we need to do so is make sure that our source data is up to date, and since running a migration against our production server isn't something that we need to be doing at this early stage we decided that it was okay to use the most recent backup of the site when running the migration. So the first thing Jenkins does, following our migration scripts, is grab the latest snapshot of the D6 MySQL database and Drupal's files directory from S3, where we store our backups, and set up a clone of the D6 site on our development server.

This is all done via a bash script, the meat of which is two commands: one for the database, and one for the files directory.

prep-migration.sh excerpt:

FILES_ZIP=$($S3CMD ls s3://lb_backups/drupalize.me/ | sort -n | tail -n 1 | awk '{print $4}')

This command uses the s3cmd cli utility to access our S3 bucket, sort the contents of a directory in chronological order, grab the last file in the list and then strip all but the filename from the output and assign it to a variable.

Once it's completed that task, Jenkin's checks to see that the D7 site is up-to-date with the latest code from our git repository, that all features have been reverted, and that all of Drupal's database updates have been run. (We clear the cache too for good measure. Twice.) Once the D6 site is setup as a source, and the D7 site is setup as a destination, we run the migrations with the run-migration.sh script. And wait.

Although we don't use it every time, the script that preps the D7 site is also capable of completely re-installing the site using our custom install profile in case we ever want to start the data migration from scratch again. This has been helpful at times, as we've made some large schema changes throughout the project.

Here's a snippet of that second script, run-migration.sh. It really is just a bunch of drush commands run in a specific order.

run-migration.sh excerpt:

...
# Optionally run the install profile for a clean install.
if [ ! -z $DOINSTALL ]; then
drush $1 si drupalize --yes;
fi
# For good measure.
drush $1 fra --yes;
drush $1 cc all;
# Migrate users first so we don't have to bother with Stub accounts.
drush $1 mi DrupalizeUser --yes;
# Migrate taxonomy.
drush $1 mi DrupalizeTermBlogTags --yes;
drush $1 mi DrupalizeTermCategories --yes;
...

And that's pretty much it. Right now this is working pretty well for us. It takes a little bit longer than I had hoped to run a full migration (about 45 minutes) but we've tracked that down to being the import of the old D6 database in to the dev server. It seems to be choking on the MyISAM tables. But that's a blog post for another day.

Related Topics: 

Comments

Add new comment