JSON Schema Validation - Checking Your Arrays

Ken W. Alger

In a previous post, I discussed some of the methods that can be used to “lock down” the schema in your MongoDB documents. In this second part of this series, I’ll continue on with techniques beyond simple required field and value validation. We’ll explore how to further benefit from MongoDB’s document model and see how to apply another validation technique to arrays.

Checking Your Arrays

We dealt with checking an array's structural properties in the last post, but this time we want to deal with an aspect of the data they contain. An array holds a selection of items. Sometimes, well usually, these are all different, but sometimes there may be duplicates of the same item. If we want to ensure that they are all different, then we need to ensure the contents of the array are unique within the array.

Imagine if your culinary endeavors take you to a food coloring company. Your product is selling sets of a variety of colors of food coloring. It would probably be a great idea to make sure that each box of food coloring includes different colors. How exciting would it be for a customer to only get a box of blue food coloring? We can use the uniqueItems keyword in our schema validator to ensure uniqueness.

db.foodColor.drop()
db.createCollection ( "foodColor",
{
    validator:
    {
        $jsonSchema:
      {
        bsonType: "object",
        required: ["name", "box_size", "dyes"],
        properties:
        {
            _id: {},
            name: {
                bsonType: ["string"],
                description: "'name' is a required string"
            },
            box_size: {
                enum: [3, 4, 6],
                description: "'box_size' must be one of the values listed and is required"
            },
            dyes: {
                bsonType: ["array"],
                minItems: 1, // each box of food color must have at least one color
                uniqueItems: true,
                additionalProperties: false,
                items: {
                    bsonType: ["object"],
                    required: ["size", "color"],
                    additionalProperties: false,
                    description: "'items' must contain the stated fields.",
                    properties: {
                        size: {
                          enum: ["small", "medium", "large"],
                          description: "'size' is required and can only be one of the given enum values"
                                },
                        color: {
                          bsonType: "string",
                          description: "'color' is a required field of type string"
                                }
                    }
                }
            }
        }
      }
    }
})

Our packages of food coloring, as defined above, must contain unique containers of dyes. That uniqueness is judged by the combination of size and color.

Document 1

We can of course have a box with three different sizes of different colors.

db.foodColor.insertOne({name: "Rainbow RGB", box_size: 3,
dyes: [
        {size: "small", color: "red"},
        {size: "medium", color: "green"},
        {size: "large", color: "blue"}]}) // works

Each item is unique. So this works.

Document 2

We could have a package with three different sizes of the same color, but the combination of color and package size must be unique in the dyes array. That means that when a document is inserted like this:

db.foodColor.insertOne({name: "Singinꞌ the Blues", box_size: 3,
dyes: [
        {size: "small", color: "blue"},
        {size: "medium", color: "blue"},
        {size: "large", color: "blue"}]}) // works

It is valid because each item in the array is a unique color and size combination. Let’s try another one:

Document 3

Let's fill a box with red coloring in various sizes:

db.foodColor.insertOne({name: "Reds", box_size: 6,
dyes: [
        {size: "small", color: "red"},
        {size: "medium", color: "red"},
        {size: "large", color: "red"},
        {size: "small", color: "scarlet"},
        {size: "small", color: "brick red"},
        {size: "small", color: "red"}
]}) // doesn't work, there are two small red dyes in this box

We see here that due to there being two small red containers in this package, the insert fails.

Document 4

What if someone tries to create a "special edition" of coloring with an aroma or taste to make the boxes more interesting? The validation schema helps there too.

db.foodColor.insertOne({name: "Specials", box_size: 3,
dyes: [
        {size: "small", color: "red", aroma:"malty"},
        {size: "medium", color: "red", aroma:"fruity"},
        {size: "large", color: "red",taste:"salty"},
]}) // doesn't work, there are extra properties

Being able to validate schema shape and values is a valuable and powerful tool. We can extend our schema validation process by defining a schema based on specific properties.

Conclusion

JSON schema validation can greatly enhance your application and add security to your system. In this particular case, we've used validation to ensure that there are no duplicates in embedded arrays in a document, entirely through defining the schema and with no additional code. We've also protected against having unauthorized extensions to the specification of array objects. For our food coloring example, the validation schema can't solve the variety problem, but it can prevent some of the worst failures possible at a database level.

The techniques provided both here and in the previous post, are wonderful tools to have in your toolbox when working with the MongoDB document model. In part three of this series, I’ll explore schema dependencies and show how to make fields dependent on the existence of others.

JSON Schema Validation Series