I was testing this MongoDB sharding feature in my local system and I have configured a bunch of working (at least I am sure all of the MongoDB instances are up and running) MongoDB instances thanks to this primitive bash script.
You can run this NestJS app by following these steps.
I forgot totally to mention where I configured which database should be sharded and which collection, here in this file I did just that.
And as for where I am inserting data it is here, it is I think clear that I have names starting with the letter “a” all the way down to the letter “z”. So that’s why I am suspecting either my configuration is wrong or I am missing some crucial step.
I have a sharded MongoDB cluster on my local system
Up and running inside Docker.
Then I tried to see how it distributes documents across shards.
So first I created like 1000 document, but they all ended up in one shard replica set.
So I got confused since I though a range sharding would just split data based on alphabets (remember that my shard key is handle), something like a to n and n to z.
The sharding operation creates a single empty chunk to cover the entire range of the shard key values.
After the initial chunk creation, the balancer migrates the initial chunk across the shards as appropriate as well as manages the chunk distribution going forward.
But to my surprise when I decided to go all out after and create like 40000 documents it started to split them.
Now the weird thing is that it failed to distribute it evenly. I have like 11000 document in shard replica set number 1 and the rest are on the shard replica set number 2.
Any similar exp? Am I missing something?
NOTE: I did not change anything, except that I increase how much data I was saving in my NestJS code.
So I solved the case. It is as simple as I described it here. Consider giving my repo a star if this Q&A helped you in anyway. I am also open for feedback. So please do not hesitate to open an issue and share your thoughts on the matter.
Please notice that this repo has several branches and the branch in charge for MongoDB sharding is mongodb-sharding .
To minimize the impact of balancing on the cluster, the balancer only begins balancing after the distribution of data for a sharded collection has reached certain thresholds.
A collection is considered balanced if the difference in data between shards (for that collection) is less than three times the configuredrange size for the collection. For the default range size of 128MB, two shards must have a data size difference for a given collection of at least 384MB for a migration to occur.