Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Unique Indexes Quirks and Unique Documents in an Array of Documents

Artur Costa7 min read • Published Oct 04, 2023 • Updated Oct 04, 2023
MongoDB
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
We are developing an application to summarize a user's financial situation. The main page of this application shows us the user's identification and the balances on all banking accounts synced with our application.
As we've seen in blog posts and recommendations of how to get the most out of MongoDB, "Data that is accessed together should be stored together." We thought of the following document/structure to store the data used on the main page of the application:
1const user = {
2 _id: 1,
3 name: { first: "john", last: "smith" },
4 accounts: [
5 { balance: 500, bank: "abc", number: "123" },
6 { balance: 2500, bank: "universal bank", number: "9029481" },
7 ],
8};
Based on the functionality of our application, we determined the following rules:
  • A user can register in the application and not sync a bank account.
  • An account is identified by its bank and number fields.
  • The same account shouldn't be registered for two different users.
  • The same account shouldn't be registered multiple times for the same user.
To enforce what was presented above, we decided to create an index with the following characteristics:
  • Given that the fields bank and number must not repeat, this index must be set as Unique.
  • Since we are indexing more than one field, it'll be of type Compound.
  • Since we are indexing documents inside of an array, it'll also be of type Multikey.
As a result of that, we have a Compound Multikey Unique Index with the following specification and options:
1const specification = { "accounts.bank": 1, "accounts.number": 1 };
2const options = { name: "Unique Account", unique: true };
To validate that our index works as we intended, we'll use the following data on our tests:
1const user1 = { _id: 1, name: { first: "john", last: "smith" } };
2const user2 = { _id: 2, name: { first: "john", last: "appleseed" } };
3const account1 = { balance: 500, bank: "abc", number: "123" };
First, let's add the users to the collection:
1db.users.createIndex(specification, options); // Unique Account
2
3db.users.insertOne(user1); // { acknowledged: true, insertedId: 1)}
4db.users.insertOne(user2); // MongoServerError: E11000 duplicate key error collection: test.users index: Unique Account dup key: { accounts.bank: null, accounts.number: null }
Pretty good. We haven't even started working with the accounts, and we already have an error. Let's see what is going on.
Analyzing the error message, it says we have a duplicate key for the index Unique Account with the value of null for the fields accounts.bank and accounts.number. This is due to how indexing works in MongoDB. When we insert a document in an indexed collection, and this document doesn't have one or more of the fields specified in the index, the value of the missing fields will be considered null, and an entry will be added to the index.
Using this logic to analyze our previous test, when we inserted user1, it didn't have the fields accounts.bank and accounts.number and generated an entry in the index Unique Account with the value of null for both. When we tried to insert the user2 in the collection, we had the same behavior, and another entry in the index Unique Account would have been created if we hadn't specified this index as unique. More info about missing fields and unique indexes can be found in our docs.
The solution for this issue is to only index documents with the fields accounts.bank and accounts.number. To accomplish that, we can specify a partial filter expression on our index options to accomplish that. Now we have a Compound Multikey Unique Partial Index (fancy name, hum, who are we trying to impress here?) with the following specification and options:
1const specification = { "accounts.bank": 1, "accounts.number": 1 };
2const optionsV2 = {
3 name: "Unique Account V2",
4 partialFilterExpression: {
5 "accounts.bank": { $exists: true },
6 "accounts.number": { $exists: true },
7 },
8 unique: true,
9};
Back to our tests:
1// Cleaning our environment
2db.users.drop({}); // Delete documents and indexes definitions
3
4/* Tests */
5db.users.createIndex(specification, optionsV2); // Unique Account V2
6db.users.insertOne(user1); // { acknowledged: true, insertedId: 1)}
7db.users.insertOne(user2); // { acknowledged: true, insertedId: 2)}
Our new index implementation worked, and now we can insert those two users without accounts. Let's test account duplication, starting with the same account for two different users:
1// Cleaning the collection
2db.users.deleteMany({}); // Delete only documents, keep indexes definitions
3db.users.insertMany([user1, user2]);
4
5/* Test */
6db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...}
7
8db.users.updateOne({ _id: user2._id }, { $push: { accounts: account1 } }); // MongoServerError: E11000 duplicate key error collection: test.users index: Unique Account V2 dup key: { accounts.bank: "abc", accounts.number: "123" }
We couldn't insert the same account into different users as we expected. Now, we'll try the same account for the same user.
1// Cleaning the collection
2db.users.deleteMany({}); // Delete only documents, keep indexes definitions
3db.users.insertMany([user1, user2]);
4
5/* Test */
6db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...}
7
8db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...}
9
10db.users.findOne({ _id: user1._id }); /*{
11 _id: 1,
12 name: { first: 'john', last: 'smith' },
13 accounts: [
14 { balance: 500, bank: 'abc', number: '123' },
15 { balance: 500, bank: 'abc', number: '123' }
16 ]
17}*/
When we don't expect things to work, they do. Again, another error was caused by not knowing or considering how indexes work on MongoDB. Reading about unique constraints in the MongoDB documentation, we learn that MongoDB indexes don't duplicate strictly equal entries with the same key values pointing to the same document. Considering this, when we inserted account1 for the second time on our user, an index entry wasn't created. With that, we don't have duplicate values on it.
Some of you more knowledgeable on MongoDB may think that using $addToSet instead of $push would resolve our problem. Not this time, young padawan. The $addToSet function would consider all the fields in the account's document, but as we specified at the beginning of our journey, an account must be unique and identifiable by the fields bank and number.
Okay, what can we do now? Our index has a ton of options and compound names, and our application doesn't behave as we hoped.
A simple way out of this situation is to change how our update function is structured, changing its filter parameter to match only the user's documents where the account we want to insert isn't in the accounts array.
1// Cleaning the collection
2db.users.deleteMany({}); // Delete only documents, keep indexes definitions
3db.users.insertMany([user1, user2]);
4
5/* Test */
6const bankFilter = {
7 $not: { $elemMatch: { bank: account1.bank, number: account1.number } }
8};
9
10db.users.updateOne(
11 { _id: user1._id, accounts: bankFilter },
12 { $push: { accounts: account1 } }
13); // { ... matchedCount: 1, modifiedCount: 1 ...}
14
15db.users.updateOne(
16 { _id: user1._id, accounts: bankFilter },
17 { $push: { accounts: account1 } }
18); // { ... matchedCount: 0, modifiedCount: 0 ...}
19
20db.users.findOne({ _id: user1._id }); /*{
21 _id: 1,
22 name: { first: 'john', last: 'smith' },
23 accounts: [ { balance: 500, bank: 'abc', number: '123' } ]
24}*/
Problem solved. We tried to insert the same account for the same user, and it didn't insert, but it also didn't error out.
This behavior doesn't meet our expectations because it doesn't make it clear to the user that this operation is prohibited. Another point of concern is that this solution considers that every time a new account is inserted in the database, it'll use the correct update filter parameters.
We've worked in some companies and know that as people come and go, some knowledge about the implementation is lost, interns will try to reinvent the wheel, and some nasty shortcuts will be taken. We want a solution that will error out in any case and stop even the most unscrupulous developer/administrator who dares to change data directly on the production database 😱.
A quick note before we go down this rabbit role. MongoDB best practices recommend implementing schema validation on the application level and using MongoDB schema validation as a backstop.
In MongoDB schema validation, it's possible to use the operator $expr to write an aggregation expression to validate the data of a document when it has been inserted or updated. With that, we can write an expression to verify if the items inside an array are unique.
After some consideration, we get the following expression:
1const accountsSet = {
2 $setIntersection: {
3 $map: {
4 input: "$accounts",
5 in: { bank: "$$this.bank", number: "$$this.number" }
6 },
7 },
8};
9
10
11const uniqueAccounts = {
12 $eq: [{ $size: "$accounts" }, { $size: accountsSet }],
13};
14
15
16const accountsValidator = {
17 $expr: {
18 $cond: {
19 if: { $isArray: "$accounts" },
20 then: uniqueAccounts,
21 else: true,
22 },
23 },
24};
It can look a little scary at first, but we can go through it.
The first operation we have inside of $expr is a $cond. When the logic specified in the if field results in true, the logic within the field then will be executed. When the result is false, the logic within the else field will be executed.
Using this knowledge to interpret our code, when the accounts array exists in the document, { $isArray: "$accounts" }, we will execute the logic withinuniqueAccounts. When the array doesn't exist, we return true signaling that the document passed the schema validation.
Inside the uniqueAccounts variable, we verify if the $size of two things is $eq. The first thing is the size of the array field $accounts, and the second thing is the size of accountsSet that is generated by the $setIntersection function. If the two arrays have the same size, the logic will return true, and the document will pass the validation. Otherwise, the logic will return false, the document will fail validation, and the operation will error out.
The $setIntersenction function will perform a set operation on the array passed to it, removing duplicate entries. The array passed to $setIntersection will be generated by a $map function, which maps each account in $accounts to only have the fields bank and number.
Let's see if this is witchcraft or science:
1// Cleaning the collection
2db.users.drop({}); // Delete documents and indexes definitions
3db.createCollection("users", { validator: accountsValidator });
4db.users.createIndex(specification, optionsV2);
5db.users.insertMany([user1, user2]);
6
7/* Test */
8db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...}
9
10db.users.updateOne(
11 { _id: user1._id },
12 { $push: { accounts: account1 } }
13); /* MongoServerError: Document failed validation
14Additional information: {
15 failingDocumentId: 1,
16 details: {
17 operatorName: '$expr',
18 specifiedAs: {
19 '$expr': {
20 '$cond': {
21 if: { '$and': '$accounts' },
22 then: { '$eq': [ [Object], [Object] ] },
23 else: true
24 }
25 }
26 },
27 reason: 'expression did not match',
28 expressionResult: false
29 }
30}*/
Mission accomplished! Now, our data is protected against those who dare to make changes directly in the database.
To get to our desired behavior, we reviewed MongoDB indexes with the unique option, how to add safety guards to our collection with a combination of parameters in the filter part of an update function, and how to use MongoDB schema validation to add an extra layer of security to our data.

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Leveraging Atlas Vector Search With HashiCorp Terraform: Empowering Semantic Search in Modern Applications


May 02, 2024 | 4 min read
Quickstart

Complex Aggregation Pipelines with Vanilla PHP and MongoDB


Sep 05, 2024 | 10 min read
Article

Mapping Terms and Concepts from SQL to MongoDB


Oct 01, 2024 | 15 min read
Tutorial

Real Time Data in a React JavaScript Front-End with Change Streams


Sep 09, 2024 | 6 min read