Unique Indexes Quirks and Unique Documents in an Array of Documents
Rate this tutorial
We are developing an application to summarize a user's financial situation. The main page of this application shows us the user's identification and the balances on all banking accounts synced with our application.
As we've seen in blog posts and recommendations of how to get the most out of MongoDB, "Data that is accessed together should be stored together." We thought of the following document/structure to store the data used on the main page of the application:
1 const user = { 2 _id: 1, 3 name: { first: "john", last: "smith" }, 4 accounts: [ 5 { balance: 500, bank: "abc", number: "123" }, 6 { balance: 2500, bank: "universal bank", number: "9029481" }, 7 ], 8 };
Based on the functionality of our application, we determined the following rules:
- A user can register in the application and not sync a bank account.
- An account is identified by its
bank
andnumber
fields. - The same account shouldn't be registered for two different users.
- The same account shouldn't be registered multiple times for the same user.
To enforce what was presented above, we decided to create an index with the following characteristics:
As a result of that, we have a
Compound Multikey Unique Index
with the following specification and options:1 const specification = { "accounts.bank": 1, "accounts.number": 1 }; 2 const options = { name: "Unique Account", unique: true };
To validate that our index works as we intended, we'll use the following data on our tests:
1 const user1 = { _id: 1, name: { first: "john", last: "smith" } }; 2 const user2 = { _id: 2, name: { first: "john", last: "appleseed" } }; 3 const account1 = { balance: 500, bank: "abc", number: "123" };
First, let's add the users to the collection:
1 db.users.createIndex(specification, options); // Unique Account 2 3 db.users.insertOne(user1); // { acknowledged: true, insertedId: 1)} 4 db.users.insertOne(user2); // MongoServerError: E11000 duplicate key error collection: test.users index: Unique Account dup key: { accounts.bank: null, accounts.number: null }
Pretty good. We haven't even started working with the accounts, and we already have an error. Let's see what is going on.
Analyzing the error message, it says we have a duplicate key for the index
Unique Account
with the value of null
for the fields accounts.bank
and accounts.number
. This is due to how indexing works in MongoDB. When we insert a document in an indexed collection, and this document doesn't have one or more of the fields specified in the index, the value of the missing fields will be considered null
, and an entry will be added to the index.Using this logic to analyze our previous test, when we inserted
user1
, it didn't have the fields accounts.bank
and accounts.number
and generated an entry in the index Unique Account
with the value of null
for both. When we tried to insert the user2
in the collection, we had the same behavior, and another entry in the index Unique Account
would have been created if we hadn't specified this index as unique
. More info about missing fields and unique indexes can be found in our docs.The solution for this issue is to only index documents with the fields
accounts.bank
and accounts.number
. To accomplish that, we can specify a partial filter expression on our index options to accomplish that. Now we have a Compound Multikey Unique Partial Index
(fancy name, hum, who are we trying to impress here?) with the following specification and options:1 const specification = { "accounts.bank": 1, "accounts.number": 1 }; 2 const optionsV2 = { 3 name: "Unique Account V2", 4 partialFilterExpression: { 5 "accounts.bank": { $exists: true }, 6 "accounts.number": { $exists: true }, 7 }, 8 unique: true, 9 };
Back to our tests:
1 // Cleaning our environment 2 db.users.drop({}); // Delete documents and indexes definitions 3 4 /* Tests */ 5 db.users.createIndex(specification, optionsV2); // Unique Account V2 6 db.users.insertOne(user1); // { acknowledged: true, insertedId: 1)} 7 db.users.insertOne(user2); // { acknowledged: true, insertedId: 2)}
Our new index implementation worked, and now we can insert those two users without accounts. Let's test account duplication, starting with the same account for two different users:
1 // Cleaning the collection 2 db.users.deleteMany({}); // Delete only documents, keep indexes definitions 3 db.users.insertMany([user1, user2]); 4 5 /* Test */ 6 db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...} 7 8 db.users.updateOne({ _id: user2._id }, { $push: { accounts: account1 } }); // MongoServerError: E11000 duplicate key error collection: test.users index: Unique Account V2 dup key: { accounts.bank: "abc", accounts.number: "123" }
We couldn't insert the same account into different users as we expected. Now, we'll try the same account for the same user.
1 // Cleaning the collection 2 db.users.deleteMany({}); // Delete only documents, keep indexes definitions 3 db.users.insertMany([user1, user2]); 4 5 /* Test */ 6 db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...} 7 8 db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...} 9 10 db.users.findOne({ _id: user1._id }); /*{ 11 _id: 1, 12 name: { first: 'john', last: 'smith' }, 13 accounts: [ 14 { balance: 500, bank: 'abc', number: '123' }, 15 { balance: 500, bank: 'abc', number: '123' } 16 ] 17 }*/
When we don't expect things to work, they do. Again, another error was caused by not knowing or considering how indexes work on MongoDB. Reading about unique constraints in the MongoDB documentation, we learn that MongoDB indexes don't duplicate strictly equal entries with the same key values pointing to the same document. Considering this, when we inserted
account1
for the second time on our user, an index entry wasn't created. With that, we don't have duplicate values on it.Some of you more knowledgeable on MongoDB may think that using $addToSet instead of $push would resolve our problem. Not this time, young padawan. The
$addToSet
function would consider all the fields in the account's document, but as we specified at the beginning of our journey, an account must be unique and identifiable by the fields bank
and number
.Okay, what can we do now? Our index has a ton of options and compound names, and our application doesn't behave as we hoped.
A simple way out of this situation is to change how our update function is structured, changing its filter parameter to match only the user's documents where the account we want to insert isn't in the
accounts
array.1 // Cleaning the collection 2 db.users.deleteMany({}); // Delete only documents, keep indexes definitions 3 db.users.insertMany([user1, user2]); 4 5 /* Test */ 6 const bankFilter = { 7 $not: { $elemMatch: { bank: account1.bank, number: account1.number } } 8 }; 9 10 db.users.updateOne( 11 { _id: user1._id, accounts: bankFilter }, 12 { $push: { accounts: account1 } } 13 ); // { ... matchedCount: 1, modifiedCount: 1 ...} 14 15 db.users.updateOne( 16 { _id: user1._id, accounts: bankFilter }, 17 { $push: { accounts: account1 } } 18 ); // { ... matchedCount: 0, modifiedCount: 0 ...} 19 20 db.users.findOne({ _id: user1._id }); /*{ 21 _id: 1, 22 name: { first: 'john', last: 'smith' }, 23 accounts: [ { balance: 500, bank: 'abc', number: '123' } ] 24 }*/
Problem solved. We tried to insert the same account for the same user, and it didn't insert, but it also didn't error out.
This behavior doesn't meet our expectations because it doesn't make it clear to the user that this operation is prohibited. Another point of concern is that this solution considers that every time a new account is inserted in the database, it'll use the correct update filter parameters.
We've worked in some companies and know that as people come and go, some knowledge about the implementation is lost, interns will try to reinvent the wheel, and some nasty shortcuts will be taken. We want a solution that will error out in any case and stop even the most unscrupulous developer/administrator who dares to change data directly on the production database 😱.
MongoDB schema validation for the win.
A quick note before we go down this rabbit role. MongoDB best practices recommend implementing schema validation on the application level and using MongoDB schema validation as a backstop.
In MongoDB schema validation, it's possible to use the operator
$expr
to write an aggregation expression to validate the data of a document when it has been inserted or updated. With that, we can write an expression to verify if the items inside an array are unique.After some consideration, we get the following expression:
1 const accountsSet = { 2 $setIntersection: { 3 $map: { 4 input: "$accounts", 5 in: { bank: "$$this.bank", number: "$$this.number" } 6 }, 7 }, 8 }; 9 10 11 const uniqueAccounts = { 12 $eq: [{ $size: "$accounts" }, { $size: accountsSet }], 13 }; 14 15 16 const accountsValidator = { 17 $expr: { 18 $cond: { 19 if: { $isArray: "$accounts" }, 20 then: uniqueAccounts, 21 else: true, 22 }, 23 }, 24 };
It can look a little scary at first, but we can go through it.
Using this knowledge to interpret our code, when the accounts array exists in the document,
{ $isArray: "$accounts" }
, we will execute the logic withinuniqueAccounts
. When the array doesn't exist, we return true
signaling that the document passed the schema validation.Inside the
uniqueAccounts
variable, we verify if the $size of two things is $eq. The first thing is the size of the array field $accounts
, and the second thing is the size of accountsSet
that is generated by the $setIntersection function. If the two arrays have the same size, the logic will return true
, and the document will pass the validation. Otherwise, the logic will return false
, the document will fail validation, and the operation will error out.The $setIntersenction function will perform a set operation on the array passed to it, removing duplicate entries. The array passed to
$setIntersection
will be generated by a $map function, which maps each account in $accounts
to only have the fields bank
and number
.Let's see if this is witchcraft or science:
1 // Cleaning the collection 2 db.users.drop({}); // Delete documents and indexes definitions 3 db.createCollection("users", { validator: accountsValidator }); 4 db.users.createIndex(specification, optionsV2); 5 db.users.insertMany([user1, user2]); 6 7 /* Test */ 8 db.users.updateOne({ _id: user1._id }, { $push: { accounts: account1 } }); // { ... matchedCount: 1, modifiedCount: 1 ...} 9 10 db.users.updateOne( 11 { _id: user1._id }, 12 { $push: { accounts: account1 } } 13 ); /* MongoServerError: Document failed validation 14 Additional information: { 15 failingDocumentId: 1, 16 details: { 17 operatorName: '$expr', 18 specifiedAs: { 19 '$expr': { 20 '$cond': { 21 if: { '$and': '$accounts' }, 22 then: { '$eq': [ [Object], [Object] ] }, 23 else: true 24 } 25 } 26 }, 27 reason: 'expression did not match', 28 expressionResult: false 29 } 30 }*/
Mission accomplished! Now, our data is protected against those who dare to make changes directly in the database.
To get to our desired behavior, we reviewed MongoDB indexes with the
unique
option, how to add safety guards to our collection with a combination of parameters in the filter part of an update function, and how to use MongoDB schema validation to add an extra layer of security to our data.