Check your version

This video covers a topic in Drupal 7 which may or may not be the version you're using. We're keeping this tutorial online as a courtesy to users of Drupal 7, but we consider it archived.

Migrate Module Overview

Video loading...

  • 0:00
    Migrate Module Basics for Drupal 7 Series
  • 0:02
    Migrate Module Overview with Joe Shindelar
  • 0:08
    JOE SHINDELAR: In this lesson, I'd like to give a short presentation
  • 0:11
    that covers some of the terminology and concepts of the Migrate module,
  • 0:15
    making it so that we can understand these things a little bit better
  • 0:17
    before we start trying to write our own migration code.
  • 0:21
    We're going to take a look at the architecture of a migration
  • 0:24
    and the way that these pieces of code are structured.
  • 0:27
    We're going to talk about the extract, transform, load process
  • 0:31
    that the Migrate module uses in order to import data into Drupal.
  • 0:36
    And finally, we'll take a look at the four key components that make
  • 0:40
    up each of our own custom migrations.
  • 0:44
    So let's go ahead and get started.
  • 0:47
    Let's start out by just going through some of the components that
  • 0:49
    make up the Migrate module and a migration,
  • 0:52
    get some terminology out of the way and make sure that we're all
  • 0:54
    on the same page when we talk about writing a migration of our own.
  • 0:58
    So some basics about the migrate module.
  • 1:01
    First off, the Migrate module is architected
  • 1:03
    using mostly object-oriented code.
  • 1:07
    Most modules for Drupal 7 just use functional code.
  • 1:11
    But if you've written plug-ins for views or worked with CTools,
  • 1:14
    you've probably encountered object-oriented code before.
  • 1:17
    If not, I would take a couple of minutes
  • 1:19
    to brush up on some of the basics of object-oriented code in PHP.
  • 1:24
    What this means for us is that the Migrate module provides
  • 1:27
    a bunch of base classes for doing things like dealing with source
  • 1:30
    data or destinations and so forth, the plug-ins for our migration,
  • 1:35
    our classes that we then extend in order to write our own custom
  • 1:39
    migration.
  • 1:40
    The Migrate module provides a framework that allows us to write
  • 1:45
    code that makes up our migration, but then
  • 1:47
    also to run those migrations.
  • 1:50
    Keeping track of where our source data lives
  • 1:53
    and where the destination data should go, running the migration,
  • 1:56
    rolling it back if necessary, testing.
  • 1:59
    It provides tools for instrumenting a migration,
  • 2:01
    so you can monitor performance and check
  • 2:03
    and see why things are running slow and improve it
  • 2:06
    for future runs of that migration and so forth.
  • 2:09
    In addition to that, it also provides a fairly simple user
  • 2:13
    interface for monitoring the status of a migration.
  • 2:16
    You can run migrations using the UI, though I will recommend
  • 2:20
    that you use Drush for running migrations.
  • 2:22
    We will talk about both options and then why I prefer one
  • 2:25
    over the other, but the UI does provide a really nice set
  • 2:29
    of displays where you can see things like, for this migration
  • 2:33
    there are hundreds rows of data that need to be imported
  • 2:36
    and so far 75 of them have been imported.
  • 2:39
    It also shows you things like, these two rows failed and here's
  • 2:43
    some additional information about why they failed.
  • 2:46
    Finally, the UI is really great if you're
  • 2:48
    working with a team of people in order to write a migration.
  • 2:52
    You might end up in a scenario where you have a couple of people that
  • 2:55
    know the code and are working on the actual code that makes up
  • 2:58
    the migration, but there's also stakeholders on the team who
  • 3:01
    just kind of need to be able to review the state of things, that
  • 3:04
    need to answer questions about how does this old data
  • 3:07
    map to what it should be in the new system?
  • 3:10
    Does this make sense?
  • 3:12
    The UI provides some really nice collaboration tools for that.
  • 3:18
    The Migrate module itself and the migrations
  • 3:21
    that we're going to write operate using
  • 3:23
    this paradigm called extract, transform, load.
  • 3:28
    The idea is that we should have separate pieces of code
  • 3:31
    to perform each of these operations.
  • 3:33
    And then the Migrate module basically
  • 3:35
    serves as the glue to tie them all together.
  • 3:37
    This is a pretty common process for pieces of code that get data
  • 3:42
    from somewhere and put it somewhere else.
  • 3:44
    The idea is you have some code that is
  • 3:47
    responsible for getting the data out of your source.
  • 3:50
    So wherever you are migrating things from,
  • 3:53
    the extract method is the code that extracts the data.
  • 3:57
    Then there's a step where that data can be transformed.
  • 4:00
    Things like changing measurements, maybe in the old system everything
  • 4:06
    was measured in centimeters and now it needs to be stored as inches.
  • 4:10
    So transform might be performing the calculation to transform that.
  • 4:14
    Transform might just be simply concatenating a couple fields
  • 4:17
    together or cleaning up the source data
  • 4:19
    so that it meets the standards of your destination.
  • 4:22
    And then, finally, there's the load operation or really just
  • 4:26
    the code that is responsible for taking the transformed source data
  • 4:31
    and saving it as a new record in the destination system.
  • 4:37
    Source and destination are two terms that you'll
  • 4:39
    hear a lot when talking about migrations.
  • 4:41
    And you'll also see them a lot in the Migrate code.
  • 4:45
    What they refer to is source, being the place
  • 4:49
    that data lives currently.
  • 4:51
    This might be an existing database, this might be an existing website.
  • 4:54
    It might even be an already existing Drupal database.
  • 4:57
    It's where-- this is where the data that I'm going to extract currently
  • 5:01
    lives.
  • 5:01
    And then you've got destinations and, in this case,
  • 5:04
    with the Migrate module, our destination
  • 5:06
    is almost always Drupal.
  • 5:08
    It's really geared towards that, taking data out
  • 5:11
    of some different source and importing it into Drupal.
  • 5:16
    So your source is, again, it's that access to your existing data.
  • 5:21
    The Migrate module implements a base class named migrate source, which
  • 5:26
    we then extend or extend one of the already implemented versions
  • 5:31
    of that, to extract our data from different sources like an SQL
  • 5:35
    database or maybe a JSON file or a CSV file.
  • 5:39
    There's lots of different places that this data could live.
  • 5:42
    In addition to being able to extract the data from its current home,
  • 5:47
    a source migration plug-in also is responsible for describing
  • 5:51
    that data to the Migrate module.
  • 5:53
    It has to be able to describe the different fields or chunks of data
  • 5:56
    that are made up from the source.
  • 5:58
    So if you have a CSV file, the source
  • 6:00
    needs to be able to describe what each of the columns in that CSV
  • 6:04
    file represents, so that we know what we're dealing with when it
  • 6:06
    comes time to try to map that data from the source to our destination.
  • 6:11
    Finally, source plug-ins in the Migrate module
  • 6:14
    are responsible for looping over rows of source data.
  • 6:18
    So extracting it-- so extracting it from the database maybe,
  • 6:22
    and then iterating over each individual row that was extracted
  • 6:26
    and handing them to the Migrate module one at a time,
  • 6:29
    so that it can perform transformations and then
  • 6:32
    load each of those rows individually.
  • 6:36
    And then there's destinations, also plug-ins
  • 6:39
    that are built by implementing the MigrateDestination class.
  • 6:44
    The Migrate module provides a handful
  • 6:46
    of these for us already, which we'll make use of.
  • 6:49
    In this case, our destination is Drupal.
  • 6:52
    And so one of these migrate destinations
  • 6:55
    is responsible for really understanding
  • 6:57
    the underlying aspects of Drupal.
  • 6:59
    It needs to know that you are trying to save a node or user
  • 7:02
    and what that entails.
  • 7:03
    And when I'm saving a user in Drupal,
  • 7:05
    I want to call the user Save function and make sure
  • 7:07
    all those hooks trigger and so forth.
  • 7:10
    The Migrate module can take care of all of that for us,
  • 7:12
    because it understands each individual destination.
  • 7:15
    We just have to tell it which one we want to use.
  • 7:17
    And then the destination plug-in is responsible for saving
  • 7:21
    one new record of data for each source row.
  • 7:25
    So the source iterates over the individual rows extracted
  • 7:29
    from whatever that source is, and then the destination is
  • 7:31
    responsible for saving each one of those rows individually.
  • 7:36
    In addition to this concept of a source and a destination
  • 7:39
    and the plug-ins that make up those, there's
  • 7:41
    also a concept of field maps in the Migrate module.
  • 7:45
    And what field maps are used for is, as part of your migration,
  • 7:49
    the custom code that you're writing, you need to tell the Migrate module
  • 7:53
    that this specific field in our source maps to that specific field
  • 7:58
    in our destination.
  • 8:00
    An example of that might be something like,
  • 8:02
    the email field in our CSV file maps to the email field
  • 8:08
    in the user table within Drupal.
  • 8:11
    We'll talk more about these field mappings quite a bit.
  • 8:14
    This is kind of the meat of what makes up most migrations.
  • 8:20
    So our field maps provide a link between source
  • 8:24
    fields and destination fields.
  • 8:25
    In addition to that, they also provide some basic functions
  • 8:28
    for transforming those values.
  • 8:30
    Some really simple things like concatenating fields together.
  • 8:34
    They also allow us to write any of our own custom code
  • 8:38
    to perform that transformation.
  • 8:40
    Another thing that you end up doing a lot when writing custom
  • 8:42
    migrations is you have to perform some transformation of the source
  • 8:47
    data.
  • 8:48
    You have to change it up a little bit,
  • 8:50
    so that it fits the requirements of your new Drupal node or your user
  • 8:55
    or whatever the case may be.
  • 8:56
    We do that using field maps as well.
  • 9:01
    And then there's migration maps, which
  • 9:04
    are different than field maps, though a similar concept.
  • 9:07
    It's all about mapping a record of source data
  • 9:10
    to a record of destination data.
  • 9:13
    In this case, we're mapping each individual row.
  • 9:16
    A map in a migration keeps track of the unique ID of source data
  • 9:21
    and the unique ID of the destination data.
  • 9:24
    And it allows us to, in the future, do things
  • 9:26
    like look up the ID of the user that was created when
  • 9:31
    Row 22 from our CSV file was imported.
  • 9:37
    This is important because we're going
  • 9:39
    to need to be able to do things like that,
  • 9:40
    look up the idea of a user that was created,
  • 9:42
    so we can use that idea in other parts of our migration.
  • 9:45
    But it also allows for us to do things like roll back a migration.
  • 9:50
    Because of the fact that we have this map that knows both
  • 9:52
    the destination ID and the source ID,
  • 9:55
    we can call rollback function that will delete just
  • 9:58
    the appropriate destination records, but leave any that were created
  • 10:02
    not as part of the migration intact.
  • 10:05
    And finally, it also means that we can update destination records.
  • 10:09
    If you change the source data rather than importing a whole bunch
  • 10:13
    of new rows, with this map in place we can actually
  • 10:17
    update an already existing row, which is a really powerful concept.
  • 10:23
    And if you combine all of those things
  • 10:25
    together, what you end up with is an individual migration.
  • 10:29
    So what we're going to be doing is writing a couple
  • 10:32
    of different migrations, telling the Migrate module what our source data
  • 10:36
    is, telling the Migrate module what the destination is,
  • 10:40
    a user or a node or so forth, telling migrate how
  • 10:43
    to map between the source and destination,
  • 10:47
    so we can keep a map of individual rows,
  • 10:49
    and also how each of those different fields, source fields
  • 10:53
    and destination fields, relate to one another and any transformation
  • 10:56
    of data that needs to happen during the process.

Migrate Module Overview

Loading...

This lesson includes a short presentation that explains the basics terminology and architecture of the migrate module and the components that make up a custom data migration. We'll talk about the Extract / Transform / Load process and how it relates to data migrations, the types of data sources that the migration module can read from, and a little bit about how the code in both the migrate module and our own custom migrations will be organized.