Influence Search Result Ranking with Function Scores in Atlas Search
Rate this tutorial
When it comes to natural language searching, it's useful to know how the order of the results for a query were determined. Exact matches might be obvious, but what about situations where not all the results were exact matches due to a fuzzy parameter, the
$near
operator, or something else?This is where the document score becomes relevant.
Every document returned by a
$search
query in MongoDB Atlas Search is assigned a score based on relevance, and the documents included in a result set are returned in order from highest score to lowest.You can choose to rely on the scoring that Atlas Search determines based on the query operators, or you can customize its behavior using function scoring and optimize it towards your needs. In this tutorial, we're going to see how the
function
option in Atlas Search can be used to rank results in an example.Per the documentation, the
function
option allows the value of a numeric field to alter the final score of the document. You can specify the numeric field for computing the final score through an expression. With this in mind, let's look at a few scenarios where this could be useful.Let's say that you have a review system like Yelp where the user needs to provide some search criteria such as the type of food they want to eat. By default, you're probably going to get results based on relevance to your search term as well as the location that you defined. In the examples below, I’m using the sample restaurants data available in MongoDB Atlas.
The
$search
query (expressed as an aggregation pipeline) to make this search happen in MongoDB might look like the following:1 [ 2 { 3 "$search": { 4 "text": { 5 "query": "korean", 6 "path": [ "cuisine" ], 7 "fuzzy": { 8 "maxEdits": 2 9 } 10 } 11 } 12 }, 13 { 14 "$project": { 15 "_id": 0, 16 "name": 1, 17 "cuisine": 1, 18 "location": 1, 19 "rating": 1, 20 "score": { 21 "$meta": "searchScore" 22 } 23 } 24 } 25 ]
The above query is a two-stage aggregation pipeline in MongoDB. The first stage is searching for "korean" in the "cuisine" document path. A fuzzy factor is applied to the search so spelling mistakes are allowed. The document results from the first stage might be quite large, so in the second stage, we're specifying which fields to return for every document. This includes a search score that is not part of the original document, but part of the search results.
As a result, you might end up with the following results:
1 [ 2 { 3 "location": "Jfk International Airport", 4 "cuisine": "Korean", 5 "name": "Korean Lounge", 6 "rating": 2, 7 "score": 3.5087265968322754 8 }, 9 { 10 "location": "Broadway", 11 "cuisine": "Korean", 12 "name": "Mill Korean Restaurant", 13 "rating": 4, 14 "score": 2.995847225189209 15 }, 16 { 17 "location": "Northern Boulevard", 18 "cuisine": "Korean", 19 "name": "Korean Bbq Restaurant", 20 "rating": 5, 21 "score": 2.995847225189209 22 } 23 ]
The default ordering of the documents returned is based on the
score
value in descending order. The higher the score, the closer your match.It's very unlikely that you're going to want to eat at the restaurants that have a rating below your threshold, even if they match your search term and are within the search location. With the
function
option, we can assign a point system to the rating and perform some arithmetic to give better rated restaurants a boost in your results.Let's modify the search query to look like the following:
1 [ 2 { 3 "$search": { 4 "text": { 5 "query": "korean", 6 "path": [ "cuisine" ], 7 "fuzzy": { 8 "maxEdits": 2 9 }, 10 "score": { 11 "function": { 12 "multiply": [ 13 { 14 "score": "relevance" 15 }, 16 { 17 "path": { 18 "value": "rating", 19 "undefined": 1 20 } 21 } 22 ] 23 } 24 } 25 } 26 } 27 }, 28 { 29 "$project": { 30 "_id": 0, 31 "name": 1, 32 "cuisine": 1, 33 "location": 1, 34 "rating": 1, 35 "score": { 36 "$meta": "searchScore" 37 } 38 } 39 } 40 ]
In the above two-stage aggregation pipeline, the part to pay attention to is the following:
1 "score": { 2 "function": { 3 "multiply": [ 4 { 5 "score": "relevance" 6 }, 7 { 8 "path": { 9 "value": "rating", 10 "undefined": 1 11 } 12 } 13 ] 14 } 15 }
What we're saying in this part of the
$search
query is that we want to take the relevance score that we had already seen in the previous example and multiply it by whatever value is in the rating
field of the document. This means that the score will potentially be higher if the rating of the restaurant is higher. If the restaurant does not have a rating, then we use a default multiplier value of 1.If we run this query on the same data as before, we might now get results that look like this:
1 [ 2 { 3 "location": "Northern Boulevard", 4 "cuisine": "Korean", 5 "name": "Korean Bbq Restaurant", 6 "rating": 5, 7 "score": 14.979236125946045 8 }, 9 { 10 "location": "Broadway", 11 "cuisine": "Korean", 12 "name": "Mill Korean Restaurant", 13 "rating": 4, 14 "score": 11.983388900756836 15 }, 16 { 17 "location": "Jfk International Airport", 18 "cuisine": "Korean", 19 "name": "Korean Lounge", 20 "rating": 2, 21 "score": 7.017453193664551 22 } 23 ]
So now, while "Korean BBQ Restaurant" might be further in terms of location, it appears higher in our result set because the rating of the restaurant is higher.
Increasing the score based on rating is just one example. Another scenario could be to give search result priority to restaurants that are sponsors. A
function
multiplier could be used based on the sponsorship level.Let's look at a different use case. Say you have an e-commerce website that is running a sale. To push search products that are on sale higher in the list than items that are not on sale, you might use a
constant
score in combination with a relevancy score.An aggregation that supports the above example might look like the following:
1 db.products.aggregate([ 2 { 3 "$search": { 4 "compound": { 5 "should": [ 6 { 7 "text": { 8 "path": "promotions", 9 "query": "July4Sale", 10 "score": { 11 "constant": { 12 "value": 1 13 } 14 } 15 } 16 } 17 ], 18 "must": [ 19 { 20 "text": { 21 "path": "name", 22 "query": "bose headphones" 23 } 24 } 25 ] 26 } 27 } 28 }, 29 { 30 "$project": { 31 "_id": 0, 32 "name": 1, 33 "promotions": 1, 34 "score": { "$meta": "searchScore" } 35 } 36 } 37 ]);
To get into the nitty gritty of the above two-stage pipeline, the first stage uses the compound operator for searching. We're saying that the search results
must
satisfy "bose headphones" and if the result-set should
contain "July4Sale" in the promotions
path, then add a constant
of one to the score for that particular result item to boost its ranking.The
should
operator doesn't require its contents to be satisfied, so you could end up with headphone results that are not part of the "July4Sale." Those result items just won't have their score increased by any value, and therefore would show up lower down in the list. The second stage of the pipeline just defines which fields should exist in the response.Being able to customize how search result sets are scored can help you deliver more relevant content to your users. While we looked at a couple examples around the
function
option with the multiply
operator, there are other ways you can use function scoring, like replacing the value of a missing field with a constant value or boosting the results of documents with search terms found in a specific path. You can find more information in the Atlas Search documentation.Don't forget to check out the MongoDB Community Forums to learn about what other developers are doing with Atlas Search.