Aggregation with User Preference Data
On this page
Data Model
Consider a sports club database with a members
collection that
tracks members' names, join dates, and sport preferences:
db.members.insertMany( [ { _id: "jane", joined: ISODate("2011-03-02"), likes: ["golf", "racquetball"] }, { _id: "joe", joined: ISODate("2012-07-02"), likes: ["tennis", "golf", "swimming"] }, { _id: "ruth", joined: ISODate("2012-01-14"), likes: ["golf", "racquetball"] }, { _id: "harold", joined: ISODate("2012-01-21"), likes: ["handball", "golf", "racquetball"] }, { _id: "kate", joined: ISODate("2012-01-14"), likes: ["swimming", "tennis"] } ] )
Return a Single Field
The following operation uses $project
to return only the
_id
field for all documents in the members
collection:
db.members.aggregate( [ { $project: { _id: 1 } } ] )
The operation returns the following documents:
[ { _id: 'jane' }, { _id: 'joe' }, { _id: 'ruth' }, { _id: 'harold' }, { _id: 'kate' } ]
For basic query and projection operations, standard queries with the
find()
method have the best performance.
Normalize and Sort Documents
The following operation returns member names in upper case and in alphabetical order. You might do this to normalize member names for processing.
db.members.aggregate( [ { $project: { name: { $toUpper: "$_id" }, _id: 0 } }, { $sort: { name: 1 } } ] )
All documents from the members
collection pass through the
pipeline, which consists of the following operations:
The operation returns the following result:
[ { name: 'HAROLD' }, { name: 'JANE' }, { name: 'JOE' }, { name: 'KATE' }, { name: 'RUTH' } ]
Return Usernames Ordered by Join Month
The following aggregation operation returns member names sorted by the month they joined. You might use this aggregation to help generate membership renewal notices.
db.members.aggregate( [ { $project: { month_joined: { $month: "$joined" }, name: "$_id", _id: 0 } }, { $sort: { month_joined: 1 } } ] )
The pipeline passes all documents in the members
collection through
the following operations:
The
$project
operator:Creates two new fields:
month_joined
andname
.Suppresses the
id
from the results. Theaggregate()
method includes the_id
, unless explicitly suppressed.
The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns those values to themonth_joined
field.The
$sort
operator sorts the results by themonth_joined
field.
The operation returns the following result:
[ { month_joined: 1, name: 'ruth' }, { month_joined: 1, name: 'harold' }, { month_joined: 1, name: 'kate' }, { month_joined: 3, name: 'jane' }, { month_joined: 7, name: 'joe' } ]
Return Total Number of Joins per Month
The following operation shows how many people joined each month of the year. You might use this aggregated data for recruiting and marketing strategies.
db.members.aggregate( [ { $project: { month_joined: { $month: "$joined" } } } , { $group: { _id: { month_joined: "$month_joined" } , number: { $sum: 1 } } }, { $sort: { "_id.month_joined": 1 } } ] )
The pipeline passes all documents in the members
collection through
the following operations:
The
$project
operator creates a new field calledmonth_joined
.The
$month
operator converts the values of thejoined
field to integer representations of the month. Then the$project
operator assigns the values to themonth_joined
field.The
$group
operator collects all documents with a givenmonth_joined
value and counts how many documents there are for that value. Specifically, for each unique value,$group
creates a new "per-month" document with two fields:_id
, which contains a nested document with themonth_joined
field and its value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containing the givenmonth_joined
value.
The
$sort
operator sorts the documents created by$group
according to the contents of themonth_joined
field.
The aggregation operation returns the following documents:
[ { _id: { month_joined: 1 }, number: 3 }, { _id: { month_joined: 3 }, number: 1 }, { _id: { month_joined: 7 }, number: 1 } ]
Return the Five Most Common "Likes"
The following aggregation collects the top five most "liked" activities in the data set. This type of analysis could help inform planning and future development.
db.members.aggregate( [ { $unwind: "$likes" }, { $group: { _id: "$likes" , number: { $sum: 1 } } }, { $sort: { number: -1 } }, { $limit: 5 } ] )
The pipeline begins with all documents in the members
collection,
and passes these documents through the following operations:
The
$unwind
operator separates each value in thelikes
array, and creates a new version of the source document for every element in the array.Example
Given the following document from the
members
collection:{ _id: "jane", joined: ISODate("2011-03-02"), likes: ["golf", "racquetball"] } The
$unwind
operator outputs the following documents:{ _id: "jane", joined: ISODate("2011-03-02"), likes: "golf" } { _id: "jane", joined: ISODate("2011-03-02"), likes: "racquetball" } The
$group
operator collects all documents with the same value for thelikes
field and counts each grouping. With this information,$group
creates a new document with two fields:_id
, which contains thelikes
value.number
, which is a generated field. The$sum
operator increments this field by 1 for every document containing the givenlikes
value.
The
$sort
operator sorts these documents by thenumber
field in reverse order.The
$limit
operator only includes the first 5 result documents.
The aggregation operation returns the following documents:
[ { _id: 'golf', number: 4 }, { _id: 'racquetball', number: 3 }, { _id: 'tennis', number: 2 }, { _id: 'swimming', number: 2 }, { _id: 'handball', number: 1 } ]