MongoDB C# Aggregation Pipeline Basics
Rate this tutorial
While basic CRUD statements like find, insert, update, and delete can take you a long way when building your application, sooner or later, users will want to look at data in different forms. As an example, if you have lots of data in a time series collection, calculating key performance indicators helps users derive meaningful conclusions from the data, e.g.:
- What was the average value for the sensor yesterday?
- What was the range of values in the last month?
- How many readings were received in the last hour?
This is where aggregation operations come in and help to transform and summarize data so that they present the information in a way that is easy to grasp and answers questions without having to look at the individual values. For instance, aggregations can be used to calculate KPIs, to group data, in paging operations, and many other scenarios.
This tutorial puts its focus on how to use aggregations with the MongoDB C# driver; if you want to learn about aggregation operations in depth, please have a look at the excellent e-book Practical MongoDB Aggregations.
You can run aggregations on the fly when querying your data. In fact, when using the MongoDB C# driver, you might run aggregations without even noticing as the driver often uses aggregations under the hood instead of plain find commands.
Especially for complex operations, it is common practice to run aggregations in the background and store the data in a collection beforehand. This can be achieved by terminating the pipeline with a
$merge
or $out
stage that writes the aggregation result into a collection.This pattern is called on-demand materialized views. The complex pipeline is run seldom whereas the more common read requests do not put too much load on the server. The data in the on-demand materialized view can be updated periodically or in reaction to data changes.
When setting up an aggregation pipeline, MongoDB Compass is a good starting point. You can open the collection in your development cluster and put together an aggregation pipeline in the graphical user interface. While the graphical user interface offers more guidance, there also is a text-based aggregation pipeline editor that can be used by more experienced developers.
This tutorial shows several ways you can run aggregations from C# code. The samples use the
sample_mflix
database so that you can easily try out the code on your own MongoDB Atlas cluster. Please see the Get Started section in the MongoDB Atlas documentation on how to deploy up a free cluster and load the sample dataset.The sample database contains the
movies
collection with a document structure similar to this:1 { 2 "_id": ObjectId("573a139af29313caabcef0ad"), 3 "imdb": { 4 "rating": 8.2, 5 "votes": 297933, 6 "id": 112641 7 }, 8 "year": 1995, 9 "title": "Casino", 10 "cast": [ 11 "Robert De Niro", 12 "Sharon Stone", 13 "Joe Pesci", 14 "James Woods" 15 ] 16 }
In our example, we want to filter by an actor, group by the year, and order the documents by the average rating of the movies of the year. This can be achieved by using the following aggregation pipeline:
1 [ 2 { 3 $match: { 4 cast: "Robert De Niro" 5 } 6 }, 7 { 8 $group: { 9 _id: "$year", 10 rating: { $avg: "$imdb.rating" } 11 } 12 }, 13 { 14 $sort: { 15 rating: -1 16 } 17 } 18 ]
When running the pipeline in MongoDB Compass, we receive the following result:
1 [ 2 { 3 "_id": 1974, 4 "rating": 9.1 5 }, 6 { 7 "_id": 1980, 8 "rating": 8.3 9 }, 10 { 11 "_id": 1995, 12 "rating": 8.25 13 }, 14 { 15 "_id": 1990, 16 "rating": 8.25 17 }, 18 // ... 19 ]
To support the aggregation, we create the following POCOs in C#:
1 [ ]2 [ ]3 public class Movie 4 { 5 [ ]6 public required string Title { get; set; } 7 8 [ ]9 public required int Year { get; set; } 10 11 [ ]12 public List<string> Cast { get; set; } = new(); 13 14 [ ]15 public Imdb Imdb { get; set; } = new(); 16 } 17 18 [ ]19 public class Imdb 20 { 21 [ ]22 public double Rating { get; set; } 23 } 24 25 public class RatingByYear 26 { 27 public int Year { get; set; } 28 29 public double AvgRating { get; set; } 30 }
The basic and most powerful way to run an aggregation pipeline in C# is to use the
AggregateAsync
method of IMongoCollection<T>
. This method takes a pipeline definition as its most important input parameter and returns a cursor. For our sample, we could depict the aggregation pipeline like this:1 var pipeline = new EmptyPipelineDefinition<Movie>() 2 .Match(x => x.Cast.Contains("Robert De Niro")) 3 .Group( 4 x => x.Year, 5 x => new RatingByYear() 6 { 7 Year = x.Key, 8 AvgRating = x.Average(y => y.Imdb.Rating) 9 }) 10 .Sort(Builders<RatingByYear>.Sort.Descending(x => x.AvgRating)); 11 var result = await (await movies.AggregateAsync(pipeline)).ToListAsync();
First, we define the pipeline and add the necessary stages to the pipeline. As you can see in the code above, you use lambda expressions or the
Builder<T>
that you might know from putting together CRUD statements.
After running the aggregation, we use ToListAsync
to store the aggregation result in a list.For our sample, the basic stages
$match
, $group
, and $sort
are sufficient, but there is a wide variety of methods that you can use to set up complex aggregation pipelines. If there is no explicit method for a specific pipeline stage, you can use the AppendStage
method to append a stage that is defined as BsonDocument
. We will have a closer look at this method in a follow-up to this tutorial.The previous sample made use of a fluent interface when defining the aggregation pipeline. In addition to
AggregateAsync
, IMongoCollection<T>
also offers an Aggregate
method that is the starting point for the fluent aggregation interface:1 var result = await movies 2 .Aggregate() 3 .Match(x => x.Cast.Contains("Robert De Niro")) 4 .Group( 5 x => x.Year, 6 x => new RatingByYear() 7 { 8 Year = x.Key, 9 AvgRating = x.Average(y => y.Imdb.Rating) 10 }) 11 .Sort(Builders<RatingByYear>.Sort.Descending(x => x.AvgRating)) 12 .ToListAsync();
This shortens the code above by some lines but basically offers the same functionality as defining the pipeline manually.
The MongoDB C# driver offers a powerful LINQ provider that is able to transform LINQ statements into a MongoDB aggregation pipeline. This way, developers can use LINQ statements in their code as they are used to; an aggregation pipeline is created under the hood and executed against the database when the results are enumerated:
1 var result = movies 2 .AsQueryable() 3 .Where(x => x.Cast.Contains("Robert De Niro")) 4 .GroupBy(x => x.Year) 5 .Select(x => new RatingByYear() 6 { 7 Year = x.Key, 8 AvgRating = x.Average(y => y.Imdb.Rating) 9 }) 10 .OrderByDescending(x => x.AvgRating) 11 .ToList();
Or alternatively in query syntax:
1 var result = (from m in movies.AsQueryable() 2 where m.Cast.Contains("Robert De Niro") 3 group m by m.Year into grp 4 select new RatingByYear() 5 { 6 Year = grp.Key, 7 AvgRating = grp.Average(y => y.Imdb.Rating) 8 } 9 into x 10 orderby x.AvgRating descending 11 select x) 12 .ToList();
While there is still a chance that a specific LINQ query cannot be transformed into an aggregation pipeline, these cases have become very rare with LINQ provider v3 and often can be solved by restructuring the LINQ statement. For other cases, the provider is maintained by MongoDB and extended regularly if there is something missing to support a LINQ query.
As you can see, it is very easy to set up and run aggregation pipelines with the MongoDB C# driver. In fact, there are a variety of ways to achieve this goal. Which method is the best to use in your project?
LINQ is widely used in C# projects and is a query technique that developers learn very early on their journey with C#. Using the LINQ-based approach enables developers to benefit from the power of aggregation pipelines without deep MongoDB knowledge.
Of course, this abstraction also means less control over the aggregation pipelines; especially in complex scenarios, using the methods of
IMongoCollection<T>
offers the flexibility to put the pipeline together so that it fits the purpose perfectly.Top Comments in Forums
There are no comments on this article yet.