Whether you’re importing data from an older version of Drupal or any other non-Drupal source, sometimes you’ll need to implement custom logic.
This course covers creating and using custom source and process plugins, and defining custom migrations. You will learn how to control what records are imported, author custom data processing logic, run custom migrations, use highwater marks to limit data imports, and track changes to source data to ensure data integrity.
Key topics
- Understanding and using source plugins
- Writing custom source plugins to extract data
- Understanding and using process plugins
- Writing custom process plugins to transform data
- Understanding and using destination plugins
- Writing custom migration YAML files to define migration processes pipelines
- Running custom migrations using Drush and other tools
- Using highwater marks to speed up migrations of large datasets
- Tracking changes to source data to ensure data integrity during migrations
Before we can learn to write a custom migration, we need some sample data and a destination site for that data.
In this tutorial we'll obtain some source data to work with and configure our Drupal destination site by creating the necessary content types and fields to accommodate the source data. Then we'll look at the data that we'll be importing and start to formulate a migration plan.
By the end of this tutorial you'll have some source data and an empty but configured destination Drupal site ready for data import.
Source plugins extract data from a source and return it as a set of rows representing an individual item to import and some additional information about the properties that make up that row.
Anyone writing a custom migration, or module developers who want to provide a migration path from their Drupal 6 or 7 module, will need to work with source plugins.
In this tutorial we'll talk about the role that source plugins fulfill and how they work. By the end of this tutorial you should be able to determine whether or not you need to write a source plugin for your migration.
This tutorial covers writing a custom source plugin that imports data from a MySQL database into Drupal nodes. After completing this tutorial you should understand how to write your own custom source plugin that can:
- Extract data from an SQL source
- Describe the various fields in the source data to the Migrate API for mapping
- Provide unique IDs for each row of imported data
By the end of this tutorial you should be able write a custom source plugin that uses an SQL data store as well as have a foundation for writing source plugins that extract data from any source that PHP can read.
Process plugins manipulate data during the transform phase of the ETL process while the data is being moved from the source to the destination. Drupal core provides a handful of common process plugins that can be used to perform the majority of data transformation tasks. If you need some functionality beyond what is already provided you can write your own custom process plugins.
In this tutorial we'll:
- Examine the role that process plugins fulfill
- Understand the processing pipeline
- List the existing process plugins in Drupal core and what each one does
- Better understand when you might need to write your own process plugin
By the end of this tutorial you should be able to explain what process plugins do, and understand how you'll make use of them in your own migration.
This tutorial covers writing a custom process plugin that you can use to manipulate the value of any field during the process (or transform) phase of a migration. Process plugins take an individual field value provided by a source plugin, and perform transformations on that data before passing it along to the load phase.
In this tutorial we'll write a process plugin that can either uppercase an entire string or the first letter of each word in the string depending on configuration.
By the end of this tutorial you should know how to start writing your own process plugins.
Destination plugins handle the load phase of the ETL (Extract, Transform, Load) process and are responsible for saving new data as Drupal content or configuration.
In this tutorial, we'll:
- Examine the role that destination plugins fulfill
- Learn about existing destination plugins
- Better understand when you might need to write your own destination plugin
By the end of this tutorial, you should be able to explain what destination plugins does and understand how you'll make use of them in your own migration.
Migration plugins are the glue that binds a source, a destination, and multiple process plugins together to move data from one place to another. Often referred to as migration templates, or simply migrations, migration plugins are YAML files that explain to Drupal where to get the data, how to process it, and where to save the resulting object.
Source, process, and destination plugins do the heavy lifting in each phase of the ETL process in a custom migration. We need to choose which plugins we want to use for each phase, as well as map fields from our source data to fields at our destination. A migration YAML file glues it all together and gives it a unique name that we can use to run it.
In this tutorial we'll:
- Determine what information we're going to move, and where we're going to move it to
- Install Migrate Plus and Migrate Tools which we'll use to run our custom migration
- Write a custom migration plugin (configuration) YAML file that will work with Migrate Tools
By the end of this tutorial you should be able to write a custom migration YAML file and understand how to choose the source, destination, and process plugins that will do the work.
As of right now, the most reliable way to run custom migrations is using Drush. Depending on the version of Drush you're using you may also need the Migrate Tools module. In this tutorial we'll walk through using Drush to run a custom migration, as well as the other commands that can be used to manage the execution of migrations.
By the end of this tutorial you should know how to run your custom migrations.
When running migrations you can use the highwater_mark
source plugin configuration option to influence which rows are considered for import on subsequent migration runs. This allows you to do things like only look at new rows added to a large dataset. Or to reimport records that have changed since the last time the migration was run. The term, highwater mark, comes from water line marks found on structures in areas where water level changes are common. In running migrations, you can think of a highwater mark as a line that denotes how far the migration has progressed, and saying, "from now one, we only care about data created after this line".
Another common use case for highwater marks is when you're importing a large dataset and the system runs out of resources. Usually this will look like a migration failing because it timed out, or the process ran out of memory. A highwater mark should allow you to pickup from where you left off.
In this tutorial we'll:
- Define what a highwater mark is, and how you can use them to limit the rows considered for importing each time a migration is executed.
- Demonstrate how highwater marks can be used to reimport source records that have been modified since the previous time the migration was executed.
- Introduce the
track_changes
option.
By the end of this tutorial you should be able to define what a highwater_mark
is and how to use them to speed up the import of large datasets or force the migration to reimport records when the source data is changed.
You should also be aware of the track_changes
feature which is a slower, but more dynamic, method of checking for changes in source data and reimporting records when a change is found.
If you need to write a migration that is capable of being executed multiple times and picking up changes to previously imported data, you can use the track_changes
configuration option of most source plugins. This will tell the migration to keep a hash of each source row, and on subsequent runs of the same migration it will compare the previous hash to the current one. If they don't match this means the source data has changed, and Drupal will reimport that record and update the existing destination record with new data.
Using track_changes
differs from calling drush migrate:import --update
in that using --update
will force every record to be re-imported regardless of whether the source data has changed or not.
In this tutorial we'll:
- Learn how change tracking works to detect changes in your source data
- Use the
track_changes
option in a migration
Note that progress of a migration can also be tracked using highwater marks if the source data has something like a last_updated
timestamp column. Using highwater marks in this case is likely more efficient than using track_changes
. It's a good idea to understand how both features work, and then choose the appropriate one for each migration.