Module Development

Use Highwater Marks to Limit What Gets Imported for Drupal 9, 10

When running migrations you can use the highwater_mark source plugin configuration option to influence which rows are considered for import on subsequent migration runs. This allows you to do things like only look at new rows added to a large dataset. Or to reimport records that have changed since the last time the migration was run. The term, highwater mark, comes from water line marks found on structures in areas where water level changes are common. In running migrations, you can think of a highwater mark as a line that denotes how far the migration has progressed, and saying, "from now one, we only care about data created after this line".

Another common use case for highwater marks is when you're importing a large dataset and the system runs out of resources. Usually this will look like a migration failing because it timed out, or the process ran out of memory. A highwater mark should allow you to pickup from where you left off.

In this tutorial we'll:

  • Define what a highwater mark is, and how you can use them to limit the rows considered for importing each time a migration is executed.
  • Demonstrate how highwater marks can be used to reimport source records that have been modified since the previous time the migration was executed.
  • Introduce the track_changes option.

By the end of this tutorial you should be able to define what a highwater_mark is and how to use them to speed up the import of large datasets or force the migration to reimport records when the source data is changed.

You should also be aware of the track_changes feature which is a slower, but more dynamic, method of checking for changes in source data and reimporting records when a change is found.