Tuning Drupalize.Me Search Results with Solr Query Re-Ranking and Search API

During the Drupal 7 era, we created tutorials on a variety of topics such as Views, Drush, Form API, and theming. When Drupal 8 was released, we updated this content for Drupal 8, 9, and 10. The significant changes between Drupal 7 and modern versions necessitated maintaining two versions of each tutorial on our site: one for legacy Drupal and another for modern Drupal.

The path to better search results

Today, we still maintain both versions. The use of our legacy Drupal content has steadily decreased, yet it still has a substantial presence in search results. This often leads to confusion, especially when members trying to learn about features in modern Drupal find themselves on a legacy Drupal tutorial.

We have always enabled faceted searching, which allows members to narrow down the results to a specific version of Drupal after performing the initial search. Recently, based on member feedback, we decided to explore additional methods to better surface content relevant to modern Drupal, which is what the majority of our members now use.

Initially, I thought setting the "Drupal 10" facet as the default could address this issue. However, after spending an entire day exploring this, I realized this approach was impractical. Facets filter based on values present in the result set, so if a search only returns "Drupal 7" content, the "Drupal 10" facet won't appear, and you cannot select an option that does not exist.

First attempt: boosting

I then explored using boosting. Boosting adjusts the relevance score of an item based on query time criteria. For example, you might give higher relevance to a result if the keywords in the search query appear in the title field rather than in the body. The Search API Solr module already supports this and we use it to rank courses and guides higher than tutorials.

I considered creating a Search API processor plugin similar to the existing solr_boost_more_recent to add a Lucene expression to the document boost factors. This Lucene function runs as part of the query and uses the result as a multiplier to boost the document. For instance, you could boost the relevance of any document tagged with "Drupal 10" in the versions field by 5, regardless of the query.

$boosts = $query->getOption('solr_document_boost_factors', []);
$boosts['taxonomy_versions'] = sprintf('if(termfreq(%s,"%s"),%2F,0.0)', SolrBackendInterface::FIELD_PLACEHOLDER, $term, $boost_factor);

if ($boosts) {
  $query->setOption('solr_document_boost_factors', $boosts);
}

This results in a Solr query like:

{!boost b=sum(if(termfreq(tm_X3b_und_version,"Drupal 10"),5.0,0.0))} (tm_X3b_und_body:+"testing" ...)

However, this method does not work with multi-word phrases like "Drupal 10". While I could get it working with "10", this solution was not ideal.

I also tried other variations, such as indexing the term ID field as an integer to see if the term ID is in the multi-value integer field:

if(itm_taxonomy_versions:1461,5.0,0.0)

or

if(exists(query({v=itm_taxonomy_versions:"1045"})),10,0)

After much experimentation and consultation in Slack, I was directed towards query re-ranking as a possible alternative.

Using query re-ranking to influence sort order in Solr search results

Query re-ranking in Solr adjusts the order of search results after the initial query execution. A re-ranking query applies a secondary scoring phase to the top N results. Unlike boosting, re-ranking modifies the order based on additional criteria post-query.

Run the query for the keyword, "views", then take the results and run another query against those results, applying a ranking weight to any document that matches the second query.

I wrote an event subscriber that hard codes a re-ranking query to all searches on Drupalize.Me:

<?php

declare(strict_types=1);

namespace Drupal\dme\EventSubscriber;

use Drupal\search_api_solr\Event\PreQueryEvent;
use Drupal\search_api_solr\Event\SearchApiSolrEvents;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;

class DmeSearchApiSolrSubscriber implements EventSubscriberInterface {

  public static function getSubscribedEvents(): array {
    return [
      SearchApiSolrEvents::PRE_QUERY => 'preQuery',
    ];
  }

  public function preQuery(PreQueryEvent $event): void {
    $solarium_query = $event->getSolariumQuery();
    $rq = '{!rerank reRankQuery=$rqq reRankDocs=1000 reRankWeight=-7 reRankOperator=add}';
    $solarium_query->addParam('rq', $rq);
    $solarium_query->addParam('rqq', '(itm_taxonomy_versions:1045 OR itm_taxonomy_versions:1044)');
  }

}

This approach allows specifying a negative reRankWeight, which effectively demotes Drupal 7 content, and avoids the need for future adjustments when new Drupal versions are added.

While I chose a hard-coded approach to minimize custom code, I also developed a proof-of-concept showing that re-ranking could be implemented in a Search API processor plugin with a configuration form.

In the preprocessSearchQuery() method of the plugin, you can set Solr query string parameters using:

$query->setOption('solr_param_rq', $rq);

This uses the solr_param_* prefix, which the Search API Solr module recognizes and applies to the Solarium query.

Now, when you perform a search on our site, the results should be more relevant.

Next, I plan to explore adding support for Solr 9's DenseVector searching feature to perform semantic searches. This will help return relevant tutorials for queries such as "How do I put fields on the page?" even if those specific phrases are not in the text.

Learn more about Solr and Drupal

If you're interested in learning more about integrating Apache Solr and Drupal, check out our Search API and Solr in Drupal course.

Related Topics

Add new comment

Filtered HTML

  • Web page addresses and email addresses turn into links automatically.
  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <code class> <ul type> <ol start type> <li> <dl> <dt> <dd><h3 id> <p>
  • Lines and paragraphs break automatically.

About us

Drupalize.Me is the best resource for learning Drupal online. We have an extensive library covering multiple versions of Drupal and we are the most accurate and up-to-date Drupal resource. Learn more