Document Validation for Polymorphic Collections
Rate this article
In data modeling design reviews with customers, I often propose a schema where different documents in the same collection contain different types of data. This makes it efficient to fetch related documents in a single, indexed query. MongoDB's flexible schema is great for optimizing workloads in this way, but people can be concerned about losing control of what applications write to these collections.
Customers are often concerned about ensuring that only correctly formatted documents make it into a collection, and so I explain MongoDB's schema validation feature. The question then comes: "How does that work with a polymorphic/single-collection schema?" This post is intended to answer that question — and it's simpler than you might think.
The application I'm working on manages customer and account details. There's a many-to-many relationship between customers and accounts. The app needs to be able to efficiently query customer data based on the customer id, and account data based on either the id of its customer or the account id.
Here's an example of customer and account documents where my wife and I share a checking account but each have our own savings account:
1 { 2 "_id": "kjfgjebgjfbkjb", 3 "customerId": "CUST-123456789", 4 "docType": "customer", 5 "name": { 6 "title": "Mr", 7 "first": "Andrew", 8 "middle": "James", 9 "last": "Morgan" 10 }, 11 "address": { 12 "street1": "240 Blackfriars Rd", 13 "city": "London", 14 "postCode": "SE1 8NW", 15 "country": "UK" 16 }, 17 "customerSince": ISODate("2005-05-20") 18 } 19 20 { 21 "_id": "jnafjkkbEFejfleLJ", 22 "customerId": "CUST-987654321", 23 "docType": "customer", 24 "name": { 25 "title": "Mrs", 26 "first": "Anne", 27 "last": "Morgan" 28 }, 29 "address": { 30 "street1": "240 Blackfriars Rd", 31 "city": "London", 32 "postCode": "SE1 8NW", 33 "country": "UK" 34 }, 35 "customerSince": ISODate("2003-12-01") 36 } 37 38 { 39 "_id": "dksfmkpGJPowefjdfhs", 40 "accountNumber": "ACC1000000654", 41 "docType": "account", 42 "accountType": "checking", 43 "customerId": [ 44 "CUST-123456789", 45 "CUST-987654321" 46 ], 47 "dateOpened": ISODate("2003-12-01"), 48 "balance": NumberDecimal("5067.65") 49 } 50 51 { 52 "_id": "kliwiiejeqydioepwj", 53 "accountNumber": "ACC1000000432", 54 "docType": "account", 55 "accountType": "savings", 56 "customerId": [ 57 "CUST-123456789" 58 ], 59 "dateOpened": ISODate("2005-10-28"), 60 "balance": NumberDecimal("10341.21") 61 } 62 63 { 64 "_id": "djahspihhfheiphfipewe", 65 "accountNumber": "ACC1000000890", 66 "docType": "account", 67 "accountType": "savings", 68 "customerId": [ 69 "CUST-987654321" 70 ], 71 "dateOpened": ISODate("2003-12-15"), 72 "balance": NumberDecimal("10341.89") 73 }
As an aside, these are the indexes I added to make those frequent queries I referred to more efficient:
1 const indexKeys1 = { accountNumber: 1 }; 2 const indexKeys2 = { customerId: 1, accountType: 1 }; 3 const indexOptions1 = { partialFilterExpression: { docType: 'account' }}; 4 const indexOptions2 = { partialFilterExpression: { docType: 'customer' }}; 5 6 db.getCollection(collection).createIndex(indexKeys1, indexOptions1); 7 db.getCollection(collection).createIndex(indexKeys2, indexOptions2);
Schema validation lets you create validation rules for your fields, such as allowed data types and value ranges.
MongoDB uses a flexible schema model, which means that documents in a collection do not need to have the same fields or data types by default. Once you've established an application schema, you can use schema validation to ensure there are no unintended schema changes or improper data types.
The validation rules are pretty simple to set up, and tools like Hackolade can make it simpler still — even reverse-engineering your existing documents.
It's simple to imagine setting up a JSON schema validation rule for a collection where all documents share the same attributes and types. But what about polymorphic collections? Even in polymorphic collections, there is structure to the documents. Fortunately, the syntax for setting up the validation rules allows for the required optionality.
I have two different types of documents that I want to store in my
Accounts
collection — customer
and account
. I included a docType
attribute in each document to identify which type of entity it represents.I start by creating a JSON schema definition for each type of document:
1 const customerSchema = { 2 required: ["docType", "customerId", "name", "customerSince"], 3 properties: { 4 docType: { enum: ["customer"] }, 5 customerId: { bsonType: "string"}, 6 name: { 7 bsonType: "object", 8 required: ["first", "last"], 9 properties: { 10 title: { enum: ["Mr", "Mrs", "Ms", "Dr"]}, 11 first: { bsonType: "string" }, 12 middle: { bsonType: "string" }, 13 last: { bsonType: "string" } 14 } 15 }, 16 address: { 17 bsonType: "object", 18 required: ["street1", "city", "postCode", "country"], 19 properties: { 20 street1: { bsonType: "string" }, 21 street2: { bsonType: "string" }, 22 postCode: { bsonType: "string" }, 23 country: { bsonType: "string" } 24 } 25 }, 26 customerSince: { 27 bsonType: "date" 28 } 29 } 30 }; 31 32 const accountSchema = { 33 required: ["docType", "accountNumber", "accountType", "customerId", "dateOpened", "balance"], 34 properties: { 35 docType: { enum: ["account"] }, 36 accountNumber: { bsonType: "string" }, 37 accountType: { enum: ["checking", "savings", "mortgage", "loan"] }, 38 customerId: { bsonType: "array" }, 39 dateOpened: { bsonType: "date" }, 40 balance: { bsonType: "decimal" } 41 } 42 };
Those definitions define what attributes should be in the document and what types they should take. Note that fields can be optional — such as
name.middle
in the customer
schema.It's then a simple matter of using the
oneOf
JSON schema operator to allow documents that match either of the two schema:1 const schemaValidation = { 2 $jsonSchema: { oneOf: [ customerSchema, accountSchema ] } 3 }; 4 5 db.createCollection(collection, {validator: schemaValidation});
I wanted to go a stage further and add some extra, semantic validations:
- For
customer
documents, thecustomerSince
value can't be any earlier than the current time. - For
account
documents, thedateOpened
value can't be any earlier than the current time. - For savings accounts, the
balance
can't fall below zero.
These documents represents these checks:
1 const badCustomer = { 2 "$expr": { "$gt": ["$customerSince", "$$NOW"] } 3 }; 4 5 const badAccount = { 6 $or: [ 7 { 8 accountType: "savings", 9 balance: { $lt: 0} 10 }, 11 { 12 "$expr": { "$gt": ["$dateOpened", "$$NOW"]} 13 } 14 ] 15 }; 16 17 const schemaValidation = { 18 "$and": [ 19 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }}, 20 { $nor: [ 21 badCustomer, 22 badAccount 23 ] 24 } 25 ] 26 };
I updated the collection validation rules to include these new checks:
1 const schemaValidation = { 2 "$and": [ 3 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }}, 4 { $nor: [ 5 badCustomer, 6 badAccount 7 ] 8 } 9 ] 10 }; 11 12 db.createCollection(collection, {validator: schemaValidation} );
If you want to recreate this in your own MongoDB database, then just paste this into your MongoDB playground in VS Code:
1 const cust1 = { 2 "_id": "kjfgjebgjfbkjb", 3 "customerId": "CUST-123456789", 4 "docType": "customer", 5 "name": { 6 "title": "Mr", 7 "first": "Andrew", 8 "middle": "James", 9 "last": "Morgan" 10 }, 11 "address": { 12 "street1": "240 Blackfriars Rd", 13 "city": "London", 14 "postCode": "SE1 8NW", 15 "country": "UK" 16 }, 17 "customerSince": ISODate("2005-05-20") 18 } 19 20 const cust2 = { 21 "_id": "jnafjkkbEFejfleLJ", 22 "customerId": "CUST-987654321", 23 "docType": "customer", 24 "name": { 25 "title": "Mrs", 26 "first": "Anne", 27 "last": "Morgan" 28 }, 29 "address": { 30 "street1": "240 Blackfriars Rd", 31 "city": "London", 32 "postCode": "SE1 8NW", 33 "country": "UK" 34 }, 35 "customerSince": ISODate("2003-12-01") 36 } 37 38 const futureCustomer = { 39 "_id": "nansfanjnDjknje", 40 "customerId": "CUST-666666666", 41 "docType": "customer", 42 "name": { 43 "title": "Mr", 44 "first": "Wrong", 45 "last": "Un" 46 }, 47 "address": { 48 "street1": "240 Blackfriars Rd", 49 "city": "London", 50 "postCode": "SE1 8NW", 51 "country": "UK" 52 }, 53 "customerSince": ISODate("2025-05-20") 54 } 55 56 const acc1 = { 57 "_id": "dksfmkpGJPowefjdfhs", 58 "accountNumber": "ACC1000000654", 59 "docType": "account", 60 "accountType": "checking", 61 "customerId": [ 62 "CUST-123456789", 63 "CUST-987654321" 64 ], 65 "dateOpened": ISODate("2003-12-01"), 66 "balance": NumberDecimal("5067.65") 67 } 68 69 const acc2 = { 70 "_id": "kliwiiejeqydioepwj", 71 "accountNumber": "ACC1000000432", 72 "docType": "account", 73 "accountType": "savings", 74 "customerId": [ 75 "CUST-123456789" 76 ], 77 "dateOpened": ISODate("2005-10-28"), 78 "balance": NumberDecimal("10341.21") 79 } 80 81 const acc3 = { 82 "_id": "djahspihhfheiphfipewe", 83 "accountNumber": "ACC1000000890", 84 "docType": "account", 85 "accountType": "savings", 86 "customerId": [ 87 "CUST-987654321" 88 ], 89 "dateOpened": ISODate("2003-12-15"), 90 "balance": NumberDecimal("10341.89") 91 } 92 93 const futureAccount = { 94 "_id": "kljkdfgjkdsgjklgjdfgkl", 95 "accountNumber": "ACC1000000999", 96 "docType": "account", 97 "accountType": "savings", 98 "customerId": [ 99 "CUST-987654333" 100 ], 101 "dateOpened": ISODate("2030-12-15"), 102 "balance": NumberDecimal("10341.89") 103 } 104 105 const negativeSavings = { 106 "_id": "shkjahsjdkhHK", 107 "accountNumber": "ACC1000000666", 108 "docType": "account", 109 "accountType": "savings", 110 "customerId": [ 111 "CUST-9837462376" 112 ], 113 "dateOpened": ISODate("2005-10-28"), 114 "balance": NumberDecimal("-10341.21") 115 } 116 117 const indexKeys1 = { accountNumber: 1 } 118 const indexKeys2 = { customerId: 1, accountType: 1 } 119 const indexOptions1 = { partialFilterExpression: { docType: 'account' }} 120 const indexOptions2 = { partialFilterExpression: { docType: 'customer' }} 121 122 const customerSchema = { 123 required: ["docType", "customerId", "name", "customerSince"], 124 properties: { 125 docType: { enum: ["customer"] }, 126 customerId: { bsonType: "string"}, 127 name: { 128 bsonType: "object", 129 required: ["first", "last"], 130 properties: { 131 title: { enum: ["Mr", "Mrs", "Ms", "Dr"]}, 132 first: { bsonType: "string" }, 133 middle: { bsonType: "string" }, 134 last: { bsonType: "string" } 135 } 136 }, 137 address: { 138 bsonType: "object", 139 required: ["street1", "city", "postCode", "country"], 140 properties: { 141 street1: { bsonType: "string" }, 142 street2: { bsonType: "string" }, 143 postCode: { bsonType: "string" }, 144 country: { bsonType: "string" } 145 } 146 }, 147 customerSince: { 148 bsonType: "date" 149 } 150 } 151 } 152 153 const accountSchema = { 154 required: ["docType", "accountNumber", "accountType", "customerId", "dateOpened", "balance"], 155 properties: { 156 docType: { enum: ["account"] }, 157 accountNumber: { bsonType: "string" }, 158 accountType: { enum: ["checking", "savings", "mortgage", "loan"] }, 159 customerId: { bsonType: "array" }, 160 dateOpened: { bsonType: "date" }, 161 balance: { bsonType: "decimal" } 162 } 163 } 164 165 const badCustomer = { 166 "$expr": { "$gt": ["$customerSince", "$$NOW"] } 167 } 168 169 const badAccount = { 170 $or: [ 171 { 172 accountType: "savings", 173 balance: { $lt: 0} 174 }, 175 { 176 "$expr": { "$gt": ["$dateOpened", "$$NOW"]} 177 } 178 ] 179 } 180 181 const schemaValidation = { 182 "$and": [ 183 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }}, 184 { $nor: [ 185 badCustomer, 186 badAccount 187 ] 188 } 189 ] 190 } 191 192 const database = 'MongoBank'; 193 const collection = 'Accounts'; 194 195 use(database); 196 db.getCollection(collection).drop(); 197 db.createCollection(collection, {validator: schemaValidation} ) 198 db.getCollection(collection).replaceOne({"_id": cust1._id}, cust1, {upsert: true}); 199 db.getCollection(collection).replaceOne({"_id": cust2._id}, cust2, {upsert: true}); 200 db.getCollection(collection).replaceOne({"_id": acc1._id}, acc1, {upsert: true}); 201 db.getCollection(collection).replaceOne({"_id": acc2._id}, acc2, {upsert: true}); 202 db.getCollection(collection).replaceOne({"_id": acc3._id}, acc3, {upsert: true}); 203 204 // The following 3 operations should fail 205 206 db.getCollection(collection).replaceOne({"_id": negativeSavings._id}, negativeSavings, {upsert: true}); 207 db.getCollection(collection).replaceOne({"_id": futureCustomer._id}, futureCustomer, {upsert: true}); 208 db.getCollection(collection).replaceOne({"_id": futureAccount._id}, futureAccount, {upsert: true}); 209 210 db.getCollection(collection).dropIndexes(); 211 db.getCollection(collection).createIndex(indexKeys1, indexOptions1); 212 db.getCollection(collection).createIndex(indexKeys2, indexOptions2);
I hope that this short article has shown how easy it is to use schema validations with MongoDB's polymorphic collections and single-collection design pattern.
I didn't go into much detail about why I chose the data model used in this example. If you want to know more (and you should!), then here are some great resources on data modeling with MongoDB:
- Daniel Coupal and Ken Alger’s excellent series of blog posts on MongoDB schema patterns
- Daniel Coupal and Lauren Schaefer’s equally excellent series of blog posts on MongoDB anti-patterns
- MongoDB University Course, MongoDB Data Modeling Path