Debugging inconsistent return values from the Drupal migration_lookup plugin

I recently ran into an issue while working on a Drupal 7 to Drupal 9 migration where the migration_lookup process plugin -- when configured to use multiple source migrations -- would sometimes return an array (e.g. array [0 => "123", 1 => "und"]), and sometimes return a string (e.g. string(3) "123"). This resulted in PHP errors being thrown while executing the migration, and all the rows failing to import, not just the individual field.

I pretty quickly figured out that the problem was the inconsistent return value of the migration_lookup plugin, but it took me a while to figure out why that was happening and what to do about it. So here’s what I learned.

While there’s some work in the Drupal core issue queue to normalize the return values, as of right now, I still had to do the work of making sure it was either all strings or all arrays. Check out this change record and the linked issues there: MigrateLookup Plugin requires returning an associative array of destination ids.

For reference, our Drupal 9 site has a node type called Guide, with an entity reference field named field_guide_references that can be populated with a bunch of other node types, and is used to keep track of what other content on the site is contained in the Guide. It’s a straightforward field configuration. And the Drupal 7 site has essentially the same field. We’re not preserving Node IDs in our migration, so in order to populate the field with the correct values in Drupal 9, I need to use a combination of the sub_process and migration_lookup process plugins.

Here’s an example of a populating an entity reference field using migration_lookup with multiple migrations:

field_guide_references:
	-
    plugin: sub_process
    source: field_guide_references
    process:
      target_id:
        -
          plugin: migration_lookup
          source: target_id
          migration:
            - dme_d7_node_collection
            - dme_d7_node_guide
            - dme_d7_node_topic
            - dme_d7_node_tutorial
            - dme_d7_node_video

The problems started when I added dme_d7_node_guide and dme_d7_node_topic to the list of possible source migrations. Previously the migration ran fine, but not all the data was being migrated because of the missing source migrations.

After making those changes I started getting errors like the following:

[error]  Value is not a valid entity. (/var/www/html/web/core/lib/Drupal/Core/Entity/Plugin/DataType/EntityReference.php:106)

This is an example of the error you might see when running a migration with a migration_lookup that returns an array instead of a string. It’s thrown in \Drupal\Core\Entity\Plugin\DataType\EntityReference::setValue when trying to validate the values in the entity reference field. Because some of those values are arrays like [0 => "123", 1 => "und"] instead of entity IDs the validation logic can’t locate the corresponding entity, and throws an error.

I debugged this a couple of ways. First I used the callback plugin in my migration to see what the field values looked like as they passed through the process pipeline:

field_guide_references:
  -
    plugin: sub_process
    source: field_guide_references
    process:
      target_id:
        -
          plugin: migration_lookup
          source: target_id
          migration:
            - dme_d7_node_collection
            - dme_d7_node_guide
            - dme_d7_node_topic
            - dme_d7_node_tutorial
            - dme_d7_node_video
        -
          plugin: callback
          callable: var_dump

As well as setting an Xdebug breakpoint at line 106 of EntityReference.php.

So why is the migration_lookup plugin sometimes returning an array, and sometimes returning a string? Because sometimes the source content entity migration has single destination ID key (and returns a string) and sometimes it has a composite key (and returns an array).

Image

When migrations are run, the Migrate API keeps a map that tracks which source records where used to create which destination records. Then the migration_lookup process plugin uses that data to figure out what value to return. The data looks like this if the destination record has a single key:

MariaDB [db]> select * from migrate_map_dme_d7_node_topic limit 2;
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+
| source_ids_hash                                                  | sourceid1 | destid1 | source_row_status | rollback_action | last_imported | hash |
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+
| 03fe38e1b58fc80e4701a764114f1802434d2c0f876fd812039f055e116fd43d |      2947 |    2789 |                 0 |               0 |             0 |      |
| 04281ee586125db4fef91e4cea7d6dd9adde2344a7e04a3bdb0c2c938e1df463 |      2946 |    2788 |                 0 |               0 |             0 |      |
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+

And like this if it has a composite key (notice the additional column for destination IDs (destid2):

MariaDB [db]> select * from migrate_map_dme_d7_node_page limit 2;
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+
| source_ids_hash                                                  | sourceid1 | destid1 | destid2 | source_row_status | rollback_action | last_imported | hash |
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+
| 0262336851a26539490397ef908a0e5131aa829668dd49041f52a650bf81c725 |      2771 |    3015 | und     |                 0 |               0 |             0 |      |
| 0329ab49f6959aeba333bbcfc1ca1582dc428e925432a3339e383246ca495b6f |      1440 |    2994 | und     |                 0 |               0 |             0 |      |
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+

This is why in some cases the migration_lookup would return "2789" and in other cases would return [0 => "2789", 1 => "und"]: some of the node migrations had composite keys, and some had single keys.

If it’s always returning an array than you can add something like the following, and extract the key that contains the referenced entity ID:

field_guide_references:
  -
    plugin: sub_process
    source: field_guide_references
    process:
      target_id:
        -
          plugin: migration_lookup
          source: target_id
          migration:
            - dme_d7_node_collection
            - dme_d7_node_guide
            - dme_d7_node_topic
            - dme_d7_node_tutorial
            - dme_d7_node_video
        -
          plugin: extract
          index:
            - 0

I tried that initially, but ended up with an error like the following because sometimes it was an array and sometimes it was a string. And this is what lead me down the trail of figuring out that migration_lookup was returning inconsistent values.

dme_d7_node_guide:field_guide_references:sub_process: extract: Input should be an array, instead it was of type 'string'

The reason, in my case, that some of the migrations had a multipart key is due to the destination plugin configuration:

destination:
  plugin: 'entity:node'
  # This line is the culprit!
  translations: true
  default_bundle: guide

Some of them contained the line translations: true and some did not. This lead me to \Drupal\migrate_drupal\Plugin\migrate\source\ContentEntity::getIds in which the entities langcode is added as a second key when translations is true. Hence, the “und” value. This is done because if you have translations enabled -- or revisions -- you could end up with multiple variations of the node with ID “123”, and the ID alone is no longer enough data to uniquely identify the imported content.

The fix is to make sure that all of your content migrations that are being used as source migrations for the migration_lookup process plugin use either a single destination ID (always return a string), or a multipart destination ID (always return an array).

In my case I removed translations: true from 2 of the migrations. And 4 hours and 2 lines of YAML later, my migrations are finally running again. Hopefully this is helpful for anyone else troubleshooting similar problems.

Comments

Thanks for this post!

In my case I couldn't remove `translations: true`, so I preferred to write a simple process plugin that extends extract and simply returns the value if it is not an array:

<?php

namespace Drupal\your_module\Plugin\migrate\process;

use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\Plugin\migrate\process\Extract;
use Drupal\migrate\Row;

/**
* Extract value from array, or return it if it is a scalar.
*
* @MigrateProcessPlugin(
* id = "extract_if_array",
* )
*/
class ExtractIfArray extends Extract {

/**
* {@inheritdoc}
*/
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
if (is_array($value)) {
return parent::transform($value, $migrate_executable, $row, $destination_property);
}

return $value;
}

}

Then you can chain extract_if_array after migration lookup:

your_field:
-
plugin: migration_lookup
migration:
- migration1
- migration2
-
plugin: extract_if_array
index:
- 0

Add new comment

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <code class> <ul type> <ol start type> <li> <dl> <dt> <dd><h3 id> <p>
  • Lines and paragraphs break automatically.

About us

Drupalize.Me is the best resource for learning Drupal online. We have an extensive library covering multiple versions of Drupal and we are the most accurate and up-to-date Drupal resource. Learn more