Rate this video
00:00:00Introduction to Sharding
00:00:24Prerequisites and Setup
00:01:26Creating and Configuring a Sharded Cluster
00:02:27Loading Sample Data and Connecting to MongoDB
00:03:39Sharding the Collection Manually
00:04:56Best Practices for Choosing a Shard Key
00:07:30Implementing Sharding in Spring Data MongoDB
The primary focus of the video is on setting up and configuring sharding in MongoDB and integrating it with a Spring application using Spring Data MongoDB.
🔑 Key Points
- Sharding is used to distribute data across multiple servers.
- A minimum of an M30 cluster is required to enable sharding in MongoDB.
- The shard key should have high cardinality and support common query patterns.
- Spring Data MongoDB does not automatically set up sharding; it must be done manually.
- The `@ShardKey` annotation in Spring Data MongoDB helps with code clarity and query optimization.
🔗 Related Links
Full Video Transcript
databases with a large enough data set or read write through put can challenge the capacity of a single server sharding is a method for Distributing your data set across multiple servers and in this tutorial we're going to look at how spring data mongod to be can bring sharing to your spring application so before we get started there's a few things we'll need first is Java installed in your machine version 17 or higher we'll also need Maven version 3.9.3 or higher you'll also need a spring book project with spring dat M it to be and spring web installed list dependencies you can set this up yourself using spring initializer or you can clone the project that we have linked to this video in the description you'll also need the mongod to be shell for interacting with your mongod to be database and you'll need a Mong account once you have all this what we're going to do is we're going to get started with setting up our cluster when creating our new cluster there's a few things we need to configure so we can chart our application first thing we'll need to do is make sure we have an m30 cluster so to do this we're on our dedicated cluster we're going to scroll down and you'll see for our tier our closer tier we have m30 you can do it with a higher closer but it needs to be a minimum m30 to enable sharting next thing we need to do is we need to go to the additional settings so keep scrolling down further and you'll see here we have Shard your cluster so if we toggle this we can decide how many shards we want so for your production applications you need a minimum of two shards to actually reap the benefits of sharing but you can choose anywhere between one and 100 shards for this we'll just go with three so if we create a closer now we should be set to go perfect now while this is provisioning this will take some time so you can step away and come back when all this is ready so now that our cluster is set up next we're going to load in some sample data so I'm just going to click this prompt here on screen and it's going to load in the mongod to be sample set we're just going to use this to test out charting our cluster later saves us creating our own data while this is loading we're also going to connect to our database so spring data mongod be does not automatically set up sharing for collections we need to do this manually and we're going to use mongos for this so I have a terminal open over here and in my terminal I'm going to connect to my database so a quick way of doing this is to just go connect you go to Shell and you get your connection string now if you don't have mongos installed there are instructions here and hit to install depending on your operating system I already have it set up here I'm just going to hit copy and then in my terminal I can paste this perfect and next I just need to enter my password if everything's okay there what it should do is it should just log in and perfect we're logged into our cluster here so we'll wait a moment and we'll wait for our sample data to Lo in and then we'll Shard that collection okay so now that our data is loaded into our database we're going to use the sample inflex database and our users collection so in this we're going to choose our Shard key as the email now there's a couple of best practices to keep in mind when you're choosing your Shard key to help maintain an even distribution of data and this is critical for maintaining a high performance and the scalability of your sharded cluster so ideally your Shard key should have a high cardal this means that the key should have many unique values to ensure the data is evenly distributed across the shards so it's not necessary that The Shard key be entirely unique but it is important that it does have this High cardinality next we're going to think about providing an even distribution of our data so to evenly distribute our documents across all of our shards we want to avoid having hotpots where one chart handles more data than the other requests so if you think if you have particularly keys that are going to be picked up a lot you want to avoid using these the reason is if you have one cluster that's handling 90% of your requests and the other two are only getting 10% of the requests then you're really not getting the most out of your shards as you could be if your application or if your database is dealing with a lot of throughput next you also want to support common queries so you want to choose a key that aligns with your common query patterns to minimize the query scatter and to optimize the performance of your application so for the users collection in our sample nflix database using the email field as the sharding key is a good choice if emails are unique or relatively unique and are well distributed and as well as that if we're using queries that frequently filter or sort by email so if we're using the email to sort our users so let's say the name doesn't have to be unique in ours and of course we're not going to be searching by password it makes sense for the email if we're frequently searching users we will be using that email field Searcher users now there's a lot to consider when you're choosing your Shard Keys you're not limited to having just one field for A Shard key you're able to have it compounded so I will link some documentation down below to help you better decide what your Shard key should be for your application now that we have that out of the way what we're going to do is we're going to actually set up our collection for this and we're going to Shard it using mongos so if we go back to our terminal I have a command here that I'm going to put in and it's just sh. card collection and then we're going to go sample inflix do users and we're going to pass in that second parameter which is just the email so this will sh our collection by the email so if we' run this this will take a little bit of time not too long at all and now our collection will be set up to be sharded by that email field now if we do want to verify that it's been sharded we can just type in sh. status and in our output here we'll be able to look through and we'll be able to check manually if our database has been charted so we can scroll up here and you can see in the configuration fors that we are in fact sharded perfect so now that we have our Mong database setup and our cllection setup to be sharded what we're going to do is we're going to go through our spring data application and look at what we need in Spring data to Shar our application now I won't be taking you through how to set up the entire application we will have a GTO repo that you can clone yourself and we'll just focus on the points needed for actually enabling sharing in the application it's really easy to get set up with spring data so this part shouldn't take too long at all so what I have here is just a very simple user API you'll see I just have a couple of end points at get all users and create user a lot of this will look very familiar to you if you're used to using [ __ ] repository you see here this means we get access to all of our crude operations at our mongod to be database and we're using our serer to implement that but the only difference we need to focus on for having a sharded database is here in our model we have our Shard key so you'll see our Shard key is email and this just to indicate to our application that we are in fact working on a shed collection now for using this all the implementation is done on the mongod database side but what this will do is it just helps with code clarity as well as that it will help with the integration of other spring data features as well as aiding the scheme Evolution so if you're working with soft like scheme of validation to ensure that any changes to The Shard key fields are deliberate and reviewed as well as that it will help with query optimization so when writing your custom queries for repository methods developers can easily identify which fields are these shared keys and they can optimize accordingly as well as that with your automated testing and it will add metadata to the fields that are sharded now here we only have The Shard key for the email but let's say it was a compound Shard key we would have something like email and password and it's just as simple as that for annotating which fields are shared keys so there you have it again everything is really done on the Mong database side all the configuration is done there and this is just for interacting with it with our spring data application once you have all that set up you're ready to get going with querying your sharded database so there you have it you now know how to get started with sharing in your application to implement that feature for your horizontal scaling if you find this tutorial useful what you can do is you can like And subscribe you can head over to the channel to find more tutorials as well as that head over to our developer Center where we have the written version of this tutorial as well as many more and if you're working with Mong to be check out our mongod to be Community forums where you can ask questions you can see what other people are working on thank you and bye [Applause] [Applause] um