The migration system makes it possible to pull content into Drupal from just about anywhere. The core Migrate APIs supports extraction from any SQL data source, including previous versions of Drupal. Contributed modules extend this system to support other data types like CSV or JSON, as well as platforms like WordPress. This system works by using an Extract, Transform, Load process with configurable plugins for each step, in order to allow you to author your own unique data imports.
The course, Explore Source, Process, and Destination Migrate API Plugins, covers the fundamentals of the Migrate API. You’ll learn how to author custom source, process, and destination plugins. We’ll walk through importing data from an example external-to-Drupal datastore into Drupal entities and fields.
If you’re working on a Drupal-to-Drupal upgrade migration, the information in this course is essential. Why? All upgrades are built using these same APIs. The difference between a Drupal-to-Drupal migration and any other data import is that for a Drupal-to-Drupal migration, the core Migrate API contain plugins that deal with common Drupal data structures and content/configuration scenarios. You will learn more about Drupal-to-Drupal upgrade migrations in Drupal-to-Drupal Migrations.
And if you're just getting started learning about migrating to Drupal, make sure you've walked through our Migrating to Drupal Essentials course first. This will give you a foundation in the concepts, terminology, and modules used in the migration ecosystem.
Before we can learn to write a custom migration, we need some sample data and a destination site for that data.
In this tutorial we'll obtain some source data to work with and configure our Drupal destination site by creating the necessary content types and fields to accommodate the source data. Then we'll look at the data that we'll be importing and start to formulate a migration plan.
By the end of this tutorial you'll have some source data and an empty but configured destination Drupal site ready for data import.
Source plugins extract data from a source and return it as a set of rows representing an individual item to import and some additional information about the properties that make up that row.
Anyone writing a custom migration, or module developers who want to provide a migration path from their Drupal 6 or 7 module, will need to work with source plugins.
In this tutorial we'll talk about the role that source plugins fulfill and how they work. By the end of this tutorial you should be able to determine whether or not you need to write a source plugin for your migration.
This tutorial covers writing a custom source plugin that imports data from a MySQL database into Drupal nodes. After completing this tutorial you should understand how to write your own custom source plugin that can:
- Extract data from an SQL source
- Describe the various fields in the source data to the Migrate API for mapping
- Provide unique IDs for each row of imported data
By the end of this tutorial you should be able write a custom source plugin that uses an SQL data store as well as have a foundation for writing source plugins that extract data from any source that PHP can read.
Process plugins manipulate data during the transform phase of the ETL process while the data is being moved from the source to the destination. Drupal core provides a handful of common process plugins that can be used to perform the majority of data transformation tasks. If you need some functionality beyond what is already provided you can write your own custom process plugins.
In this tutorial we'll:
- Examine the role that process plugins fullfil
- Understand the processing pipeline
- List the existing process plugins in Drupal core and what each one does
- Better understand when you might need to write your own process plugin
By the end of this tutorial you should be able to explain what process plugins do, and understand how you'll make use of them in your own migration.
This tutorial covers writing a custom process plugin that you can use to manipulate the value of any field during the process (or transform) phase of a migration. Process plugins take an individual field value provided by a source plugin, and perform transformations on that data before passing it along to the load phase.
In this tutorial we'll write a process plugin that can either uppercase an entire string or the first letter of each word in the string depending on configuration.
By the end of this tutorial you should know how to start writing your own process plugins.
Destination plugins handle the load phase of the ETL (Extract, Transform, Load) process and are responsible for saving new data as Drupal content or configuration.
In this tutorial, we'll:
- Examine the role that destination plugins fulfill
- Learn about existing destination plugins
- Better understand when you might need to write your own destination plugin
By the end of this tutorial, you should be able to explain what destination plugins does and understand how you'll make use of them in your own migration.
Migration plugins are the glue that binds a source, a destination, and multiple process plugins together to move data from one place to another. Often referred to as migration templates, or simply migrations, migration plugins are YAML files that explain to Drupal where to get the data, how to process it, and where to save the resulting object.
Source, process, and destination plugins do the heavy lifting in each phase of the ETL process in a custom migration. We need to choose which plugins we want to use for each phase, as well as map fields from our source data to fields at our destination. A migration YAML file glues it all together and gives it a unique name that we can use to run it.
In this tutorial we'll:
- Determine what information we're going to move, and where we're going to move it to
- Install Migrate Plus and Migrate Tools which we'll use to run our custom migration
- Write a custom migration plugin (configuration) YAML file that will work with Migrate Tools
By the end of this tutorial you should be able to write a custom migration YAML file and understand how to choose the source, destination, and process plugins that will do the work.
As of right now, the most reliable way to run custom migrations is using Drush. Depending on the version of Drush you're using you may also need the Migrate Tools module. In this tutorial we'll walk through using Drush to run a custom migration, as well as the other commands that can be used to manage the execution of migrations.
By the end of this tutorial you should know how to run your custom migrations.
When running migrations you can use the
highwater_mark source plugin configuration option to influence which rows are considered for import on subsequent migration runs. This allows you to do things like only look at new rows added to a large dataset. Or to reimport records that have changed since the last time the migration was run. The term, highwater mark, comes from water line marks found on structures in areas where water level changes are common. In running migrations, you can think of a highwater mark as a line that denotes how far the migration has progressed, and saying, "from now one, we only care about data created after this line".
Another common use case for highwater marks is when you're importing a large dataset and the system runs out of resources. Usually this will look like a migration failing because it timed out, or the process ran out of memory. A highwater mark should allow you to pickup from where you left off.
In this tutorial we'll:
- Define what a highwater mark is, and how you can use them to limit the rows considered for importing each time a migration is executed.
- Demonstrate how highwater marks can be used to reimport source records that have been modified since the previous time the migration was executed.
- Introduce the
By the end of this tutorial you should be able to define what a
highwater_mark is and how to use them to speed up the import of large datasets or force the migration to reimport records when the source data is changed.
You should also be aware of the
track_changes feature which is a slower, but more dynamic, method of checking for changes in source data and reimporting records when a change is found.
If you need to write a migration that is capable of being executed multiple times and picking up changes to previously imported data, you can use the
track_changes configuration option of most source plugins. This will tell the migration to keep a hash of each source row, and on subsequent runs of the same migration it will compare the previous hash to the current one. If they don't match this means the source data has changed, and Drupal will reimport that record and update the existing destination record with new data.
track_changes differs from calling
drush migrate:import --update in that using
--update will force every record to be re-imported regardless of whether the source data has changed or not.
In this tutorial we'll:
- Learn how change tracking works to detect changes in your source data
- Use the
track_changesoption in a migration
Note that progress of a migration can also be tracked using highwater marks if the source data has something like a
last_updated timestamp column. Using highwater marks in this case is likely more efficient than using
track_changes. It's a good idea to understand how both features work, and then choose the appropriate one for each migration.