I recently ran into an issue while working on a Drupal 7 to Drupal 9 migration where the migration_lookup
process plugin -- when configured to use multiple source migrations -- would sometimes return an array (e.g. array [0 => "123", 1 => "und"]
), and sometimes return a string (e.g. string(3) "123"
). This resulted in PHP errors being thrown while executing the migration, and all the rows failing to import, not just the individual field.
I pretty quickly figured out that the problem was the inconsistent return value of the migration_lookup
plugin, but it took me a while to figure out why that was happening and what to do about it. So here’s what I learned.
While there’s some work in the Drupal core issue queue to normalize the return values, as of right now, I still had to do the work of making sure it was either all strings or all arrays. Check out this change record and the linked issues there: MigrateLookup Plugin requires returning an associative array of destination ids.
For reference, our Drupal 9 site has a node type called Guide, with an entity reference field named field_guide_references that can be populated with a bunch of other node types, and is used to keep track of what other content on the site is contained in the Guide. It’s a straightforward field configuration. And the Drupal 7 site has essentially the same field. We’re not preserving Node IDs in our migration, so in order to populate the field with the correct values in Drupal 9, I need to use a combination of the sub_process
and migration_lookup
process plugins.
Here’s an example of a populating an entity reference field using migration_lookup
with multiple migrations:
field_guide_references:
-
plugin: sub_process
source: field_guide_references
process:
target_id:
-
plugin: migration_lookup
source: target_id
migration:
- dme_d7_node_collection
- dme_d7_node_guide
- dme_d7_node_topic
- dme_d7_node_tutorial
- dme_d7_node_video
The problems started when I added dme_d7_node_guide
and dme_d7_node_topic
to the list of possible source migrations. Previously the migration ran fine, but not all the data was being migrated because of the missing source migrations.
After making those changes I started getting errors like the following:
[error] Value is not a valid entity. (/var/www/html/web/core/lib/Drupal/Core/Entity/Plugin/DataType/EntityReference.php:106)
This is an example of the error you might see when running a migration with a migration_lookup
that returns an array instead of a string. It’s thrown in \Drupal\Core\Entity\Plugin\DataType\EntityReference::setValue
when trying to validate the values in the entity reference field. Because some of those values are arrays like [0 => "123", 1 => "und"]
instead of entity IDs the validation logic can’t locate the corresponding entity, and throws an error.
I debugged this a couple of ways. First I used the callback
plugin in my migration to see what the field values looked like as they passed through the process pipeline:
field_guide_references:
-
plugin: sub_process
source: field_guide_references
process:
target_id:
-
plugin: migration_lookup
source: target_id
migration:
- dme_d7_node_collection
- dme_d7_node_guide
- dme_d7_node_topic
- dme_d7_node_tutorial
- dme_d7_node_video
-
plugin: callback
callable: var_dump
As well as setting an Xdebug breakpoint at line 106 of EntityReference.php.
So why is the migration_lookup
plugin sometimes returning an array, and sometimes returning a string? Because sometimes the source content entity migration has single destination ID key (and returns a string) and sometimes it has a composite key (and returns an array).
When migrations are run, the Migrate API keeps a map that tracks which source records where used to create which destination records. Then the migration_lookup
process plugin uses that data to figure out what value to return. The data looks like this if the destination record has a single key:
MariaDB [db]> select * from migrate_map_dme_d7_node_topic limit 2;
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+
| source_ids_hash | sourceid1 | destid1 | source_row_status | rollback_action | last_imported | hash |
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+
| 03fe38e1b58fc80e4701a764114f1802434d2c0f876fd812039f055e116fd43d | 2947 | 2789 | 0 | 0 | 0 | |
| 04281ee586125db4fef91e4cea7d6dd9adde2344a7e04a3bdb0c2c938e1df463 | 2946 | 2788 | 0 | 0 | 0 | |
+------------------------------------------------------------------+-----------+---------+-------------------+-----------------+---------------+------+
And like this if it has a composite key (notice the additional column for destination IDs (destid2
):
MariaDB [db]> select * from migrate_map_dme_d7_node_page limit 2;
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+
| source_ids_hash | sourceid1 | destid1 | destid2 | source_row_status | rollback_action | last_imported | hash |
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+
| 0262336851a26539490397ef908a0e5131aa829668dd49041f52a650bf81c725 | 2771 | 3015 | und | 0 | 0 | 0 | |
| 0329ab49f6959aeba333bbcfc1ca1582dc428e925432a3339e383246ca495b6f | 1440 | 2994 | und | 0 | 0 | 0 | |
+------------------------------------------------------------------+-----------+---------+---------+-------------------+-----------------+---------------+------+
This is why in some cases the migration_lookup
would return "2789"
and in other cases would return [0 => "2789", 1 => "und"]
: some of the node migrations had composite keys, and some had single keys.
If it’s always returning an array than you can add something like the following, and extract the key that contains the referenced entity ID:
field_guide_references:
-
plugin: sub_process
source: field_guide_references
process:
target_id:
-
plugin: migration_lookup
source: target_id
migration:
- dme_d7_node_collection
- dme_d7_node_guide
- dme_d7_node_topic
- dme_d7_node_tutorial
- dme_d7_node_video
-
plugin: extract
index:
- 0
I tried that initially, but ended up with an error like the following because sometimes it was an array and sometimes it was a string. And this is what lead me down the trail of figuring out that migration_lookup
was returning inconsistent values.
dme_d7_node_guide:field_guide_references:sub_process: extract: Input should be an array, instead it was of type 'string'
The reason, in my case, that some of the migrations had a multipart key is due to the destination plugin configuration:
destination:
plugin: 'entity:node'
# This line is the culprit!
translations: true
default_bundle: guide
Some of them contained the line translations: true
and some did not. This lead me to \Drupal\migrate_drupal\Plugin\migrate\source\ContentEntity::getIds
in which the entities langcode
is added as a second key when translations
is true
. Hence, the “und”
value. This is done because if you have translations enabled -- or revisions -- you could end up with multiple variations of the node with ID “123”, and the ID alone is no longer enough data to uniquely identify the imported content.
The fix is to make sure that all of your content migrations that are being used as source migrations for the migration_lookup
process plugin use either a single destination ID (always return a string), or a multipart destination ID (always return an array).
In my case I removed translations: true
from 2 of the migrations. And 4 hours and 2 lines of YAML later, my migrations are finally running again. Hopefully this is helpful for anyone else troubleshooting similar problems.
Comments
Thanks for this post!
In my case I couldn't remove `translations: true`, so I preferred to write a simple process plugin that extends extract and simply returns the value if it is not an array:
<?php
namespace Drupal\your_module\Plugin\migrate\process;
use Drupal\migrate\MigrateExecutableInterface;
use Drupal\migrate\Plugin\migrate\process\Extract;
use Drupal\migrate\Row;
/**
* Extract value from array, or return it if it is a scalar.
*
* @MigrateProcessPlugin(
* id = "extract_if_array",
* )
*/
class ExtractIfArray extends Extract {
/**
* {@inheritdoc}
*/
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
if (is_array($value)) {
return parent::transform($value, $migrate_executable, $row, $destination_property);
}
return $value;
}
}
Then you can chain extract_if_array after migration lookup:
your_field:
-
plugin: migration_lookup
migration:
- migration1
- migration2
-
plugin: extract_if_array
index:
- 0
Add new comment