How to Use Custom Aggregation Expressions in MongoDB 4.4
Rate this tutorial
The upcoming release of MongoDB 4.4 makes it easier than ever to work with, transform, access, and make sense of your data. This release, the beta of which you can try right now, comes with a couple of new operators that make it possible to write custom functions to extend the MongoDB Query Language. This feature, called Custom Aggregation Expressions, allows you to write JavaScript functions that execute as part of an aggregation pipeline stage. These come in handy when you need to implement behavior that is not supported by the MongoDB Query Language by default.
The MongoDB Query Language has many operators, or functions, that allow you to manipulate and transform your data to fit your application's use case. Operators such as $avg , $concat , and $filter make it easy for developers to query, manipulate, and transform their dataset directly at the database level versus having to write additional code and transforming the data elsewhere. While there are operators for almost anything you can think of, there are a few edge cases where a provided operator or series of operators won't be sufficient, and that's where custom aggregation expressions come in.
In this blog post, we'll learn how we can extend the MongoDB Query Language to suit our needs by writing our own custom aggregation expressions using the new $function and $accumulator operators. Let's dive in!
For this tutorial you'll need:
MongoDB 4.4 comes with two new operators: $function and $accumulator . These two operators allow us to write custom JavaScript functions that can be used in a MongoDB aggregation pipeline. We are going to look at examples of how to use both by implementing our own custom aggregation expressions.
To get the most value out of this blog post, I will assume that you are already familiar with the MongoDB aggregation framework. If not, I suggest checking out the docs and following a tutorial or two and becoming familiar with how this feature works before diving into this more advanced topic.
Before we get into the code, I want to briefly talk about why you would care about this feature in the first place. The first reason is delivering higher performance to your users. If you can get the exact data you need directly out of the database in one trip, without having to do additional processing and manipulating, you will be able to serve and fulfill requests quicker. Second, custom aggregation expressions allow you to take care of edge cases directly in your aggregation pipeline stage. If you've worked with the aggregation pipeline in the past, you'll feel right at home and be productive in no time. If you're new to the aggregation pipeline, you'll only have to learn it once. By the time you find yourself with a use case for the
$function
or $accumulator
operators, all of your previous knowledge will transfer over. I think those are two solid reasons to care about custom aggregation expressions: better performance for your users and increased developer productivity.The one caveat to the liberal use of the
$function
and $accumulator
operators is performance. Executing JavaScript inside of an aggregation expression is resource intensive and may reduce performance. You should always opt to use existing, highly optimized operators first, especially if they can get the job done for your use case. Only consider using $function
and $accumulator
if an existing operator cannot fulfill your application's needs.The first operator we'll take a look at is called
$function
. As the name implies, this operator allows you to implement a custom JavaScript function to implement any sort of behavior. The syntax for this operator is:1 { 2 $function: { 3 body: <code>, 4 args: <array expression>, 5 lang: "js" 6 } 7 }
The
$function
operator has three properties. The body
, which is going to be our JavaScript function, an args
array containing the arguments we want to pass into our function, and a lang
property specifying the language of our $function
, which as of MongoDB 4.4 only supports JavaScript.The
body
property holds our JavaScript function as either a type of BSON Code or String. In our examples in this blog post, we'll write our code as a String. Our JavaScript function will have a signature that looks like this:1 function(arg){ 2 return arg 3 }
From a cursory glance, it looks like a standard JavaScript function. You can pass in
n
number of arguments, and the function returns a result. The arguments within the body
property will be mapped to the arguments provided in the args
array property, so you'll need to make sure you pass in and capture all of the provided arguments.Now that we know the properties of the
$function
operator, let's use it in an aggregation pipeline. To get started, let's choose a data set to work from. We'll use one of the provided MongoDB sample datasets that you can find on MongoDB Atlas. If you don't already have a cluster set up, you can do so by creating a free MongoDB Atlas account. Loading the sample datasets is as simple as clicking the "..." button on your cluster and selecting the "Load Sample Dataset" option.Once you have the sample dataset loaded, let's go ahead and connect to our MongoDB cluster. Whenever learning something new, I prefer to use a visual approach, so for this tutorial, I'll rely on MongoDB Compass. If you already have MongoDB Compass installed, connect to your cluster that has the sample dataset loaded, otherwise download the latest version here, and then connect.
Whether you are using MongoDB Compass or connecting via the mongo shell, you can find your MongoDB Atlas connection string by clicking the "Connect" button on your cluster, choosing the type of app you'll be using to connect with, and copying the string which will look like this:
mongodb+srv://mongodb:<password>@cluster0-tdm0q.mongodb.net/test
.Once you are connected, the dataset that we will work with is called
sample_mflix
and the collection movies
. Go ahead and connect to that collection and then navigate to the "Aggregations" tab. To ensure that everything works fine, let's write a very simple aggregation pipeline using the new $function
operator. From the dropdown, select the $addFields
operator and add the following code as its implementation:1 { 2 fromFunction: {$function: {body: "function(){return 'hello'}", args: [], lang: 'js'}} 3 }
If you are using the mongo shell to execute these queries the code will look like this:
1 db.movies.aggregate([ 2 { 3 $addFields: { 4 fromFunction: { 5 $function: { 6 body: "function(){return 'hello'}", 7 args: [], 8 lang: 'js' 9 } 10 } 11 } 12 } 13 ])
If you look at the output in MongoDB Compass and scroll to the bottom of each returned document, you'll see that each document now has a field called
fromFunction
with the text hello
as its value. We could have simply passed the string "hello" instead of using the $function
operator, but the reason I wanted to do this was to ensure that your version of MongoDB Compass supports the $function
operator and this is a minimal way to test it.Next, let's implement a custom function that actually does some work. Let's add a new field to every movie that has Ado's review score, or perhaps your own?
I'll name my field
adoScore
. Now, my rating system is unique. Depending on the day and my mood, I may like a movie more or less, so we'll start figuring out Ado's score of a movie by randomly assigning it a value between 0 and 5. So we'll have a base that looks like this: let base = Math.floor(Math.random() * 6);
.Next, if critics like the movie, then I do too, so let's say that if a movie has an IMDB score of over 8, we'll give it +1 to Ado's score. Otherwise, we'll leave it as is. For this, we'll pass in the
imdb.rating
field into our function.Finally, movies that have won awards also get a boost in Ado's scoring system. So for every award nomination a movie receives, the total Ado score will increase by 0.25, and for every award won, the score will increase by 0.5. To calculate this, we'll have to provide the
awards
field into our function as well.Since nothing is perfect, we'll add a custom rule to our function: if the total score exceeds 10, we'll just output the final score to be 9.9. Let's see what this entire function looks like:
1 { 2 adoScore: {$function: { 3 body: "function(imdb, awards){let base = Math.floor(Math.random() * 6) \n let imdbBonus = 0 \n if(imdb > 8){ imdbBonus = 1} \n let nominations = (awards.nominations * 0.25) \n let wins = (awards.wins * 0.5) \n let final = base + imdbBonus + nominations + wins \n if(final > 10){final = 9.9} \n return final}", 4 args: ["$imdb.rating", "$awards"], 5 lang: 'js'}} 6 }
To make the JavaScript function easier to read, here it is in non-string form:
1 function(imdb, awards){ 2 let base = Math.floor(Math.random() * 6) 3 let imdbBonus = 0 4 5 if(imdb > 8){ imdbBonus = 1} 6 7 let nominations = awards.nominations * 0.25 8 let wins = awards.wins * 0.5 9 10 let final = base + imdbBonus + nominations + wins 11 if(final > 10){final = 9.9} 12 13 return final 14 }
And again, if you are using the mongo shell, the code will look like:
1 db.movies.aggregate([ 2 { 3 $addFields: { 4 adoScore: { 5 $function: { 6 body: "function(imdb, awards){let base = Math.floor(Math.random() * 6) \n let imdbBonus = 0 \n if(imdb > 8){ imdbBonus = 1} \n let nominations = (awards.nominations * 0.25) \n let wins = (awards.wins * 0.5) \n let final = base + imdbBonus + nominations + wins \n if(final > 10){final = 9.9} \n return final}", 7 args: ["$imdb.rating", "$awards"], 8 lang: 'js' 9 } 10 } 11 } 12 } 13 ])
Running the above
$addFields
aggregation , which uses the $function
operator, will produce a result that adds a new adoScore
field to the end of each document. This field will contain a numeric value ranging from 0 to 9.9. In the $function
operator, we passed our custom JavaScript function into the body
property. As we iterated through our documents, the $imdb.rating
and $awards
fields from each document were passed into our custom function.Using dot notation, we've seen how to specify any sub-document you may want to use in an aggregation. We also learned how to use an entire field and it's subfields in an aggregation, as we've seen with the
$awards
parameter in our earlier example. Our final result looks like this:This is just scratching the surface of what we can do with the
$function
operator. In our above example, we paired it with the $addFields
operator, but we can also use $function
as an alternative to the $where
operator, or with other operators as well. Check out the $function
docs for more information.The next operator that we'll look at, which also allows us to write custom JavaScript functions, is called the $accumulator operator and is a bit more complex. This operator allows us to define a custom accumulator function with JavaScript. Accumulators are operators that maintain their state as documents progress through the pipeline. Much of the same rules apply to the
$accumulator
operator as they do to $function
. We'll start by taking a look at the syntax for the $accumulator
operator:1 { 2 $accumulator: { 3 init: <code>, 4 initArgs: <array expression>, // Optional 5 accumulate: <code>, 6 accumulateArgs: <array expression>, 7 merge: <code>, 8 finalize: <code>, // Optional 9 lang: <string> 10 } 11 }
We have a couple of additional fields to discuss. Rather than just one
body
field that holds a JavaScript function, the $accumulator
operator gives us four additional places to write JavaScript:- The
init
field that initializes the state of the accumulator. - The
accumulate
field that accumulates documents coming through the pipeline. - The
merge
field that is used to merge multiple states. - The
finalize
field that is used to update the result of the accumulation.
For arguments, we have two places to provide them: the
initArgs
that get passed into our init
function, and the accumulateArgs
that get passed into our accumulate
function. The process for defining and passing the arguments is the same here as it is for the $function
operator. It's important to note that for the accumulate
function the first argument is the state
rather than the first item in the accumulateArgs
array.Finally, we have to specify the
lang
field. As before, it will be js
as that's the only supported language as of the MongoDB 4.4 release.To see a concrete example of the
$accumulator
operator in action, we'll continue to use our sample_mflix
dataset. We'll also build on top of the adoScore
we added with the $function
operator. We'll pair our $accumulator
with a $group
operator and return the number of movies released each year from our dataset, as well as how many movies are deemed watchable by Ado's scoring system (meaning they have a score greater than 8). Our $accumulator
function will look like this:1 { 2 _id: "$year", 3 consensus: { 4 $accumulator: { 5 init: "function(){return {total:0, worthWatching: 0}}", 6 accumulate: "function(state, adoScore){let worthIt = 0; if(adoScore > 8){worthIt = 1}; return {total:state.total + 1, worthWatching: state.worthWatching + worthIt }}", 7 accumulateArgs:["$adoScore"], 8 merge: "function(state1, state2){return {total: state1.total + state2.total, worthWatching: state1.worthWatching + state2.worthWatching}}", 9 } 10 } 11 }
And just to display the JavaScript functions in non-string form for readability:
1 // Init 2 function(){ 3 return { total:0, worthWatching: 0 } 4 } 5 6 // Accumulate 7 function(state, adoScore){ 8 let worthIt = 0; 9 if(adoScore > 8){ worthIt = 1}; 10 return { 11 total: state.total + 1, 12 worthWatching: state.worthWatching + worthIt } 13 } 14 15 // Merge 16 function(state1, state2){ 17 return { 18 total: state1.total + state2.total, 19 worthWatching: state1.worthWatching + state2.worthWatching 20 } 21 }
If you are running the above aggregation using the mongo shell, the query will look like this:
1 db.movies.aggregate([ 2 { 3 $group: { 4 _id: "$year", 5 consensus: { 6 $accumulator: { 7 init: "function(){return {total:0, worthWatching: 0}}", 8 accumulate: "function(state, adoScore){let worthIt = 0; if(adoScore > 8){worthIt = 1}; return {total:state.total + 1, worthWatching: state.worthWatching + worthIt }}", 9 accumulateArgs:["$adoScore"], 10 merge: "function(state1, state2){return {total: state1.total + state2.total, worthWatching: state1.worthWatching + state2.worthWatching}}", 11 } 12 } 13 } 14 } 15 ])
The result of running this query on the
sample_mflix
database will look like this:Note: Since the
adoScore
function does rely on Math.random()
for part of its calculation, you may get varying results each time you run the aggregation.Just like the
$function
operator, writing a custom accumulator and using the $accumulator
operator should only be done when existing operators cannot fulfill your application's use case. Similarly, we are also just scratching the surface of what is achievable by writing your own accumulator. Check out the docs for more.Before we close out this blog post, let's take a look at what our completed aggregation pipeline will look like combining both our
$function
and $accumulator
operators. If you are using the sample_mflix
dataset, you should be able to run both examples with the following aggregation pipeline code:1 db.movies.aggregate([ 2 { 3 '$addFields': { 4 'adoScore': { 5 '$function': { 6 'body': 'function(imdb, awards){let base = Math.floor(Math.random() * 6) \n let imdbBonus = 0 \n if(imdb > 8){ imdbBonus = 1} \n let nominations = (awards.nominations * 0.25) \n let wins = (awards.wins * 0.5) \n let final = base + imdbBonus + nominations + wins \n if(final > 10){final = 9.9} \n return final}', 7 'args': [ 8 '$imdb.rating', '$awards' 9 ], 10 'lang': 'js' 11 } 12 } 13 } 14 }, { 15 '$group': { 16 '_id': '$year', 17 'consensus': { 18 '$accumulator': { 19 'init': 'function(){return {total:0, worthWatching: 0}}', 20 'accumulate': 'function(state, adoScore){let worthIt = 0; if(adoScore > 8){worthIt = 1}; return {total:state.total + 1, worthWatching: state.worthWatching + worthIt }}', 21 'accumulateArgs': [ 22 '$adoScore' 23 ], 24 'merge': 'function(state1, state2){return {total: state1.total + state2.total, worthWatching: state1.worthWatching + state2.worthWatching}}' 25 } 26 } 27 } 28 } 29 ])
The new
$function
and $accumulator
operators released in MongoDB 4.4 improve developer productivity and allow MongoDB to handle many more edge cases out of the box. Just remember that these new operators, while powerful, should only be used if existing operators cannot get the job done as they may degrade performance!Whether you are trying to use new functionality with these operators, fine-tuning your MongoDB cluster to get better performance, or are just trying to get more done with less, MongoDB 4.4 is sure to provide a few new and useful things for you. You can try all of these features out today by deploying a MongoDB 4.4 beta cluster on MongoDB Atlas for free.
If you have any questions about these new operators or this blog post, head over to the MongoDB Community forums and I'll see you there.
Happy experimenting!
Safe Harbor Statement
The development, release, and timing of any features or functionality
described for MongoDB products remains at MongoDB's sole discretion.
This information is merely intended to outline our general product
direction and it should not be relied on in making a purchasing decision
nor is this a commitment, promise or legal obligation to deliver any
material, code, or functionality. Except as required by law, we
undertake no obligation to update any forward-looking statements to
reflect events or circumstances after the date of such statements.