Schema Suggestions with Julia Oppenheim - Podcast Episode 59
Rate this podcast
Today, we are joined by Julia Oppenheim, Associate Product Manager at MongoDB. Julia chats with us and shares details of a set of features within MongoDB Atlas designed to help developers improve the design of their schemas to avoid common anti-patterns.
The notion that MongoDB is schema-less is a bit of a misnomer. Traditional relational databases use a separate entity in the database that defines the schema - the structure of the tables/rows/columns and acceptable values that get stored in the database. MongoDB takes a slightly different approach. The schema does exist in MongoDB, but to see what that schema is - you typically look at the documents previously written to the database. With this in mind, you, as a developer have the power to make decisions about the structure of the documents you store in your database... and as they say with great power, comes great responsibility.
MongoDB has created a set of features built into Atlas that enable you to see when your assumptions about the structure of your documents turn out to be less than optimal. These features come under the umbrella of Schema Suggestions and on today's podcast episode, Julia Oppenheim joins Nic Raboy and I to talk about how Schema Suggestions can help you maintain and improve the performance of your applications by exposing anti-patterns in your schema.
Julia: [00:00:00] My name is Julia Oppenheim and welcome to the Mongo DB podcast. Stay tuned to learn more about how to improve your schema and alleviate schema anti-patterns with schema suggestions and Mongo DB Atlas.
Michael: [00:00:12] And today we're talking with Julia Oppenheim. Welcome to the show, Julia, it's great to have you on the podcast. Thanks. It's great to be here. So why don't you introduce yourself to the audience? Let folks know who you are and what you do at Mongo DB.
Julia: [00:00:26] Yeah. Sure. So hi, I'm Julia. I actually joined Mongo DB about nine months ago as a product manager on Rez's team.
So yeah, I actually did know that you had spoken to him before. And if you listened to those episodes Rez probably touched on what our team does, which is. Ensure that the customer's journey or the user's journey with Mongo DB runs smoothly and that their deployments are performance. Making sure that, you know, developers can focus on what's truly exciting and interesting to them like pushing out new features and they don't have the stress of is my deployment is my database.
You know, going to have any problems. We try to make that process as smooth as possible.
Michael: [00:01:10] Fantastic. And today we're going to be focusing on schemas, right. Schema suggestions, and eliminating schema. Anti-patterns so hold the phone, Mike. Yeah, yeah, go ahead, Nick.
Nic: [00:01:22] I thought I thought I'm going to be people call this the schema-less database.
Michael: [00:01:28] Yeah, I guess that is, that is true. With the document database, it's not necessary to plan your schema ahead of time. So maybe Julia, do you want to shed some light on why we need schema suggestions in the Mongo DB
Julia: [00:01:41] Yeah, no, I think that's a really good point and definitely a common misconception.
So I think one of the draws of Mongo DB is that schema can be pretty flexible. And it's not rigid in the sense that other more relational databases you know, they have a strict set of rules and how you can access the data. I'm going to be as definitely more lenient in that regard, but at the end of the day, you still.
Need certain fields value types and things like that dependent on the needs of your application. So one of the first things that any developer will do is kind of map out what their use cases for their applications are and figure out how they should store the data to make sure that those use cases can be carried out.
I think that you can kind of get a little stuck with schema in MongoDB, is that. The needs of your application changed throughout the development cycle. So a schema that may work on day one when you're you know, user base is relatively small, your feature set is pretty limited. May not work. As your app, you get Cisco, you may need to refactor a little bit, and it may not always be immediately obvious how to do that.
And, you know, we don't expect users to be experts in MongoDB and schema design with Mongo DB which is why I think. Highlighting schema anti-patterns is very useful.
Michael: [00:03:03] Fantastic. So do you want to talk a little bit about how the product works? How schema suggestions work in Mongo DB. Atlas?
Julia: [00:03:12] Yeah. So there are two places where you as a user can see schema anti-patterns they're in.
The performance advisor tab a, which Rez definitely touched on if he talked about autopilot and index suggestions, and you can also see schema anti-patterns in the in our data Explorer. So the collections tab, and we can talk about you know, in a little bit why we have them in two separate places, but in general what you, as the user will see is the same.
So we. Flag schema anti-patterns we give kind of like a brief explanation as to why we flagged them. We'll show, which collections are impacted by you know, this anti-pattern that we've identified and we'll also kind of give a call to action on how to address them. So we actually have custom docs on the six schema anti-patterns that we.
Look for at this stage of the products, you know, life cycle, and we give kind of steps on how to solve it, what our recommendation would be, and also kind of explain, you know, why it's a problem and how it can really you know, come back to hurt you later on.
Nic: [00:04:29] So you've thrown out the keyword schema.
Anti-patterns a few times now, do you want to go over what you said? There are six of them, right? We want to go what each of those six are.
Julia: [00:04:39] Yeah, sure. So there are, like you said, there are six. So I think that we look for use of Our dollar lookup operations. So this means that where it's very, very similar to joining in the relational world where you would be accessing data across different collections.
And this is not always ideal because you're reading and performing, you know, different logic on more than one collection. So in general, it just takes a lot of time a little more resource intensive and. You know, when we see this, we're kind of thinking, oh, this person might come from a more relational background.
That's not to say that this is always a problem. It could make sense to do this in certain cases. Which is where things get a little dicier, but that's the first one that we look for. The, another one is looking for unbounded arrays. So if you just keep. Embedding information and have no limit on that.
The size of your documents can get really, really big. This, we actually have a limit in place and this is one of our third anti-patterns where if you keep you'll hit our 16 megabyte per document limit which kind of means that. Your hottest documents are the working set, takes up too much space on RAM.
So now we're going to disk to fulfill your request, which is, you know, generally again, we'll take some time it's more resource you know, consumptive, things like that.
Nic: [00:06:15] This might be out of scope, but how do you prevent an unbounded array in Mongo DB? Like. I get the concept, but I've never, I've never heard of it done in a database before, so this would be new to me.
Julia: [00:06:27] So this is going to be a little contradictory to the lookup anti-pattern that I just mentioned, and I think that we can talk about this more. Cause I know that when I was first learning about anti-patterns and they did seem very contradictory to me and I got of stressed. So we'll talk about that in a little bit, but the way you would avoid.
The unbounded array would probably be to reference other documents. So that's essentially doing the look of that. I just said was an anti-pattern, but one way to think of it is say you have, okay, so you have a developer collection and you have different information about the developer, like their team at Mongo DB.
You know how long they've been here and maybe you have all of their get commits and like they get commit. It could be an embedded document. It could have like the date of the commit and what project it was on and things like that. A developer can have, you know, infinitely many commits, like maybe they just commit a lot and there was no bound on that.
So you know, it's a one to many relationship and. If that were in an array, I think we all see that that would grow probably would hit that 16 megabyte limit. What we would instead maybe want to consider doing is creating like a commit collection where we would then tie it back to the developer who made the commit and reference it from the original developer document.
I don't know if that analogy was helpful, but that's, that's kind of how you would handle that.
Michael: [00:08:04] And I think the the key thing here is, you know, you get to make these decisions about how you design your schema. You're not forced to normalize data in one way across the entire database, as you are in the relational world.
And so you're going to make a decision about the number of elements in a potential array versus the cost of storing that data in separate collections and doing a lookup. And. Obviously, you know, you may start, you may embark on your journey to develop an application, thinking that your arrays are going to be within scope within a relative, relatively low number.
And maybe the use pattern changes or the number of users changes the number of developers using your application changes. And at some point you may need to change that. So let me ask the question about the. The user case when I'm interacting with Mongo DB Atlas, and my use case does change. My user pattern does change.
How will that appear? How will it surface in the product that now I've breached the limits of what is an acceptable pattern. And now it's, I'm in the scope of an anti-pattern.
Julia: [00:09:16] Right. So when that happens, the best place for it to be flagged is our performance advisor tab. So we'll have, we have a little card that says improve your schema.
And if we have anti-patterns that we flagged we'll show the number of suggestions there. You can click it to learn more about them. And what we do there is it's based on. A sample of your data. So we kind of try to catch these in a reactive sense. We'll see that something is going on and we'll give you a suggestion to improve it.
So to do that, we like analyze your data. We try to determine which collections matter, which collections you're really using. So based on the number of reads and writes to the collections, we'll kind of identify your top 20 collections and then. We'll see what's going on. We'll look for, you know, the edgy pattern, some of which I've mentioned and kind of just collect, this is all going on behind the scenes, by the way, we'll kind of collect you know, distributions of, you know, average data size, our look ups happening you know, just looking for some of those anti-patterns that I've mentioned, and then we'll determine which ones.
You can actually fix and which ones are most impactful, which ones are actually a problem. And then we surface that to the user.
Nic: [00:10:35] So is it monitoring what type of queries you're doing or is it just looking at, based on how your documents are structured when it's suggesting a schema?
Julia: [00:10:46] Yeah. It's mainly looking for how your documents are structured.
The dollar lookup is a little tricky because it is, you know, an operation that's kind of happening under the hood, but it's based on the fact that you're referencing things within the document.
Michael: [00:11:00] Okay. So we talked about the unbounded arrays. We talked about three anti-patterns so far. Do you want to continue on the journey of anti-patterns?
Julia: [00:11:10] Okay. Yeah. Yeah, no, definitely. So one that we also flag is at the index level, and this is something that is also available in porphyry performance advisor in general.
So if you have unnecessary indexes on the collection, that's something that is problematic because an index just existing is you know, it consumes resources, it takes up space and. It can slow down, writes, even though it does slow down speed up reads. So that's like for indexes in general, but then there's the case where the index isn't actually doing anything and it may be kind of stale.
Maybe your query patterns have changed and things like that. So if you have excessive indexes on your collection, we'll flag that, but I will say in performance advisor we do now have index removal recommendations that. We'll say this is the actual index that you should remove. So a little more granular which is nice.
Then another one we have is reducing the number of collections you have in general. So at a certain point, collections again, consume a lot of resources. You have indexes on the collections. You have a lot of documents. Maybe you're referencing things that could be embedded. So that's just kind of another sign that you might want to refactor your data landscape within Mongo DB.
Michael: [00:12:36] Okay. So we've talked about a number of, into patterns so far, we've talked about a use of dollar lookup, storing unbounded arrays in your documents. We've talked about having too many indexes. We've talked about having a large document sizes in your collections. We've talked about too many collections.
And then I guess the last one we need to cover off is around case insensitive rejects squares. You want to talk a little bit about that?
Julia: [00:13:03] Yeah. So. Like with the other anti-patterns we'll kind of look to see when you have queries that are using case insensitive red jacks and recommend that you have the appropriate index.
So it could be case insensitive. Index, it could be a search index, things like that. That is, you know, the last anti-pattern we flag.
Michael: [00:13:25] Okay. Okay, great. And obviously, you know, any kind of operation against the database is going to require resource. And the whole idea here is there's a balancing act between leveraging the resource and and operating efficiently.
So, so these are, this is a product feature that's available in Mongo, DB, Atlas. All of these things are available today. Correct? Yeah. And you would get to, to see these suggestions in the performance advisor tab, right?
Julia: [00:13:55] Yes. Performance advisor. And also as I mentioned, our data Explorer, which is our collections.
Yeah. Right.
Michael: [00:14:02] Yeah. Fantastic. The whole entire goal of. Automating database management is to make it easier for the developer to interact with the database. What else do we want to tell the audience about a schema suggestions or anything in this product space? So
Julia: [00:14:19] I think definitely want to highlight what you just mentioned, that, you know, your schema changes the anti-patterns that could be, you know, more damaging to your performance.
Change over time and it really does depend on your workload and how you're accessing the data. I know that, you know, some of this FEMA anti-patterns do conflict with each other. We do say that some cases you S you should reduce references and some cases you shouldn't, it really depends on, you know, is the data that you want to access together, actually being stored together.
And does that. You know, it makes sense. So they won't all always apply. It will be kind of situational and that's, you know why we're here to help.
Nic: [00:15:01] So when people are using Mongo DB to create documents in their collections, I imagine that they have some pretty intense looking document schemas, like I'm talking objects that are nested eight levels deep.
Will the schema suggestions help in those scenarios to try to improve how people have created their data?
Julia: [00:15:23] Schema suggestions are still definitely in their early days. I think we released this product almost a year ago. We'll definitely capture any of the six anti-patterns that we just mentioned if they're happening on a high level.
So if you're nesting a lot of stuff within the document, that would probably increase. You know, document size and we would flag it. We might not be able to get that targeted to say, this is why your document sizes this large. But I think that that's a really good call-out and it's safe to say, we know that we are not capturing every scenario that a user could encounter with their schema.
You can truly do whatever you want you know, designing your Mongo DB documents. Were actively researching, which schema suggestions it makes sense to look for in our next iteration of this product. So if you have feedback, you know, always don't hesitate to reach out. We'd love to hear your thoughts.
So yeah, there are definitely some limitations we're working on it. We're looking into it.
Michael: [00:16:27] Okay. Let's say I'm a developer and I have a number of collections that maybe they're not accessed as frequently, but I am concerned about the patterns in them. How can I force the performance advisor to look at a specific collection?
Julia: [00:16:43] Yeah, that's a really good question. So as I mentioned before, we do surface the anti-patterns in two places. One is performance advisor and that's for the more reactive use case where doing a sweep, seeing what's going on and those 20 most active collections and kind of. Doing some logic to determine where the most impactful changes could be made.
And then there's also the collections tab in Atlas. And this is where you can go say you're actively developing or adding documents to collection. They aren't heavily used yet, but you want to make sure you're on the right track. If you view the schema, anti-patterns there, it basically runs our algorithm for you.
And we'll. Search a sample of collections for that, or sorry, a sample of documents for that collection and surface the suggestions there. So it's a little more targeted. And I would say very useful for when you're actively developing something or have a small workload.
Michael: [00:17:39] We've got a huge conference coming up in July.
It's Mongo, db.live. My first question is, are you going to be there? Are you perhaps presenting a talk on on this subject at.live?
Julia: [00:17:50] I am not presenting a talk on this subject at.live, but I will be there. I'm very, very excited for it.
Michael: [00:17:56] Fantastic. Well, maybe we can get you to come to community day, which is the week after where we've got talks and sessions and games and all sorts of fun stuff for the community.
Maybe we can get you to to talk a little bit about this at the at the event that would be. That would be fantastic. I'm going to be.live is our biggest user conference of the year. Joined us July 13th and 14th. It's free. It's all online. There's a huge lineup of cutting edge keynotes and breakout sessions.
All sorts of ask me anything, panels and brain breaking activities so much more. You can get more information@mongodb.com slash live. All right, Nick, anything else to add before we begin to wrap?
Nic: [00:18:36] Nothing for me. I mean, Julia, is there any other last minute words of wisdom or anything that you want to tell the audience about schemas suggestions with the Mongo DB or anything that'll help them?
Yeah,
Julia: [00:18:47] I don't think so. I think we covered a lot again. I would just emphasize you know, don't be overwhelmed. Scheme is very important for Mongo DB. And it is meant to be flexible. We're just here to help you.
Nic: [00:19:00] I think that's the key word there. It's not a, it's not schema less. It's just flexible schema, right?
Julia: [00:19:05] Yes, yes, yes.
Michael: [00:19:05] Yes. Well, Julia, thank you so much. This has been a great conversation.
Julia: [00:19:09] Awesome. Thanks for having me.