Earlier this month I hosted a Drupal-to-Drupal Migration Workshop, and one of the attendees asked about merging two entity reference fields into a single field during the migration. The scenario is you’ve got a Drupal 7 event content type with an entity reference field that relates session, and another that relates sponsors. On the Drupal 10 site you want to consolidate this into a single entity reference field that relates both sessions and sponsors via the same field.
I wasn't quite sure how to approach this but we brainstormed some pseudo code, and then came up with the following two solutions.
The problem
The value of the two reference fields from the D7 source is an array of arrays. The other array has one row for each value in the D7 field, and the inner array is an associative array with the key target_id
that indicates the ID of the node being referenced.
Example:
$field_related_sessions = [
0 => ['target_id' => 42],
1 => ['target_id' => 1337],
];
$field_related_sponsors = [
0 => ['target_id' => 86],
];
Use the merge
plugin from Migrate Plus
You can use the merge
plugin provided by the Migrate Plus module to merge these two arrays into a single field.
field_merged:
-
plugin: merge
source:
- field_related_sessions
- field_related_sponsors
You’ll end up with something like this:
$field_merged = [
0 => ['target_id' => 42],
1 => ['target_id' => 1337],
2 => ['target_id' => 86],
];
If the Node IDs are the same between Drupal 7, and Drupal 10 this should work fine. However, if you need to use the migration_lookup
plugin to ensure the correct Drupal 10 Node IDs are used, you’ll need to add a sub_process
step like this:
field_merged:
-
plugin: merge
source:
- field_related_sessions
- field_related_sponsors
-
plugin: sub_process
process:
target_id:
-
plugin: migration_lookup
source: target_id
migration:
- upgrade_d7_node_complete_session
- upgrade_d7_node_complete_sponsor
-
# Leave this off if you're not migrating translations.
plugin: extract
index:
- 0
The result will look the same as the before adding the sub_process
just that the IDs will be updated to convert the Drupal 7 source ID to the Drupal 10 destination ID.
Note the use of the extract
plugin at the end there. This may or may not be necessary depending on your specific migration. If you’re migrating translations, the output from migration_lookup
will be a composite key. If you’re not, it’ll be a single value. Learn more about this in Debugging inconsistent return values from the Drupal migration_lookup plugin.
What about duplicate values?
If you know that the fields on the Drupal 7 site are not going to contain duplicate values this should be all you need to do. However, if it’s possible that both fields could reference the same node you might want to de-duplicate the data. For example, what if your data looked like this?
$field_related_sessions = [
0 => ['target_id' => 42],
2 => ['target_id' => 42],
1 => ['target_id' => 1337],
];
$field_related_sponsors = [
// Duplicate!
0 => ['target_id' => 42],
1 => ['target_id' => 86],
];
Initially I thought Drupal 10 might just handle that automatically. And perform some de-duplication logic before it saved the imported field values. But it doesn’t. It just saves whatever you give it. So we need to update the migration to handle this scenario.
Here’s an example that will de-duplicate the array. It works by first merging the two arrays like before. And then using array_build
to create a new array out of merged values that’ll look like this:
$_field_related_content_deduped = [
42 => "42"
1337 => "1337"
86 => "86"
];
Since an array can’t have duplicate keys, you’ve done the de-duplication work. But now you need to get that back into the format required for the entity reference field. So we use the callback
plugin to call PHP’s array_chunk()
function. And then iterate over that with sub_process
to create a new array in the required format.
source:
constants:
ONE: 1
process:
_field_related_content_deduped:
-
plugin: merge
source:
- field_related_sessions
- field_related_sponsors
-
plugin: array_build
key: target_id
value: target_id
field_merged:
-
plugin: callback
callable: array_chunk
unpack_source: true
source:
- '@_field_related_content_deduped'
- constants/ONE
-
plugin: sub_process
process:
target_id:
-
plugin: array_pop
source:
- '0'
Or, write a custom process plugin
Alternatively you could write a custom process plugin. Something like the following should do the trick:
<?php
namespace Drupal\my_custom_module\Plugin\migrate\process;
use Drupal\migrate\ProcessPluginBase;
/**
* A process plugin to de-dupe an array.
*
* @MigrateProcessPlugin(
* id = "my_custom_deduplication_filter"
* )
*/
class MyCustomDeduplicationFilter extends ProcessPluginBase {
/**
* {@inheritdoc}
*/
public function transform($value, MigrateExecutableInterface $migrate_executable, Row $row, $destination_property) {
$seen = [];
$dedupedArray = [];
foreach ($value as $item) {
$key = $item[$keyField];
if (!isset($seen[$key])) {
$seen[$key] = true;
$dedupedArray[] = $item;
}
}
return $dedupedArray;
}
}
Conclusion
It might not be the most elegant solution, but we got something that works and learned a bunch along the way. Getting here required a lot of use of the --migrate-debug
flag from the Migrate Devel module. As well as exploring the available process plugins that we might be able to use.
This might not be the best solution, but it works. How would you go about solving this? Got any suggestions for how to improve it?
Keep going!
You can learn more in our Process Plugins tutorial. And our Learn to Migrate to Drupal guide.
Add new comment