Even if your aggregation may provide the appropriate answer with the sample data provided, your aggregation does not work properly. The way you $project items in the 2nd stage does not take into consideration that items is a top level array for order_type:bulk. You set it to an empty array which completely wipes out the items array that already exists. This was mentioned by

Your pipeline works with the provided dataset because all the items of the only bulk order are already present in invidual orders.

Your $addToSet seems to add the whole items array rather than individual item names. So it looks like the unique_items count it provides is the number of orders rather than the number of items. This will be confirmed if the count for user3 is 2 rather than 3.

I created a playground with the original data set and your pipeline but with an extra item in the bulk invoice that is not present in any of the individual orders. You will see that this item (which I named only_in_bulk) is not in your result set. I also use a different field of the unique item counts and kept the result of $addToSet to show that it may count the number of items list rather than the number of items.

I also update my previous playground with only_in_bulk to show that the updated result set is different when some items only appers in bulk orders.

1 Like