$group aggregation on array fields is not covered by MultiKey index

Gavin_Liu · 2024-06-24T10:27:47.923Z

I have documents, each with an array field “tags”, as illustrated below. Each tag is a pair of (ns, name).

I want to efficiently find all the tags belonging to a certain userId — aka all the unique pairs of (ns, name). Therefore I create a index on (userId, tags.ns, tags.name). It’s a MultiKey index because it indexes array fields.

Then, I ran the following aggregation pipeline using “$unwind” and “$group”, hoping that my index will cover this query. However, according to explain(), it doesn’t . What MongoDB did was to IXSCAN on this index to find ALL the documents belonging to the userId, and then fetch the documents, and then run the rest of grouping. This is very inefficient . I think all the information are already in the index. How can I force MongoDB to only use that index?

As illustration, my collection contains these documents:

[{
  "_id": {
    "$oid": "6611a306cd377c1f9ef54538"
  },
  "userId": "1",
  "tags": [
    {
      "ns": "UserTags",
      "name": "tag1"
    },
    {
      "ns": "SystemTags",
      "name": "tag1"
    }
  ]
},
{
  "_id": {
    "$oid": "661232f86e2b72aae0b39cad"
  },
  "userId": "1",
  "tags": [
    {
      "ns": "UserTags",
      "name": "tag2"
    }
  ]
}]

The aggregation pipeline I used is:

[ { $sort: { "userId": 1, "tags.ns": 1, "tags.name": 1 } }, { $match: { userId: "690a5c13-0b4b-4464-8716-361c04b81923" } }, { $unwind: "$tags" }, { $group: { _id: {ns: "$tags.ns", name: "$tags.name"} } }, { $project: { _id: 0, ns: "$_id.ns", name: "$_id.name" } } ]

The execution plan is:

image648×864 39.3 KB

turivishal · 2024-06-24T17:24:12.299Z

Hello @Gavin_Liu, Welcome

The multikey index can be covered but there are some restrictions and one of them is you can not project the array field that makes the index multikey!

Also, you have a $unwind stage that requires additional processing before projection so it will not cover the query.

So you can not cover your query with the multikey index.

Gavin_Liu · 2024-06-25T05:30:38.427Z

Hi @turivishal , thanks for your quick reply. My real goal is to efficiently find all the unique (tags.ns, tags.name) pairs given a userId. There could be ~100K documents for a given userId, but the the number unique pairs is only O(100) . How do I achieve this then?

turivishal · 2024-06-25T05:54:30.315Z

Hi @Gavin_Liu,

You can store unique tags in a separate collection to access them easily. You have to maintain it every time whenever a user updates or adds new tags to the profile.

Gavin_Liu · 2024-06-25T06:14:15.785Z

Ah I see. This seems significantly more work for the application level, but I will consider this. Thanks.