Exploring Search Capabilities With Atlas Search
Aasawari Sahasrabuddhe9 min read • Published Jul 30, 2024 • Updated Aug 20, 2024
FULL APPLICATION
Rate this tutorial
Welcome to the second part of our series on leveraging MongoDB Atlas Search with Spring Boot. In Part 1, we learned how to create Atlas Search indexes using various methods: the MongoDB CLI, the Atlas UI, and programmatically within a Spring Boot application. We explored the fundamental steps required to set up these indexes, providing a solid foundation for implementing powerful search capabilities.
In this part, we will build upon that foundation by diving deeper into the practical application of these indexes. This article will focus on crafting and executing search queries using Spring Boot, illustrating how to harness the full potential of Atlas Search in your applications.
We will also explain key concepts and terminologies associated with Atlas Search, ensuring you have a comprehensive understanding of how it operates.
Whether you are a seasoned developer or new to MongoDB and Spring Boot, this series aims to equip you with the knowledge and tools needed to implement efficient and effective search functionality in your applications.
- Java Version 22
- Sample data loaded in your atlas cluster
Before we get into writing search queries using the indexes we've created, we highly recommend familiarizing yourself with the foundational concepts of MongoDB Atlas Search. A great resource for this is Part 1 of the series "Getting Started With MongoDB Atlas Search and Java.".
This tutorial will provide you with an understanding of the key principles behind Atlas Search and demonstrate how to create different types of indexes tailored to various use cases. By gaining this background knowledge, you'll be better equipped to effectively implement and leverage search functionalities in your Spring Boot applications.
In each of the sections below, we will cover the terminology and understand real-life examples where the indexes would be helpful.
The search queries mentioned will be used on the movies collection of the sample_mflix database
If you have an application where the schema changes frequently, setting up dynamic property to true will automatically create the indexes on the supported data type fields. For example, the testIndex01 has the dynamic mapping to true, which indexes all fields with supported data types.
The below function makes use of testIndex01 to query all the string index fields.
1 public ArrayList<Document> searchMovies(String query) { 2 List<Document> pipeline = Arrays.asList(new Document("$search", 3 new Document("index", "testIndex01") 4 .append("text", 5 new Document("query", "cartoon") 6 .append("path", Arrays.asList("title", "plot", "fullplot")))), 7 new Document("$project", 8 new Document("_id", 0L) 9 .append("title", 1L) 10 .append("plot", 1L) 11 .append("fullplot", 1L)), 12 new Document("$limit", 3L)); 13 14 ArrayList<Document> results = new ArrayList<>(); 15 collection.aggregate(pipeline).into(results); 16 return results; 17 }
Using the above function, you can find movies that mention the keyword mentioned in the _query _on the title, plot, and fullplot fields defined in the path. _For example…
1 curl -X GET "http://localhost:8080/search?query=cartoon" | jq
…will give all movies with the keyword “cartoon” as shown below:
1 [ 2 { 3 "plot": "Follow Garfield behind the scenes in Cartoon World where he grows bored with his life as a comic strip star.", 4 "title": "Garfield Gets Real", 5 "fullplot": "Follow Garfield behind the scenes in Cartoon World where he grows bored with his life as a comic strip star." 6 }, 7 { 8 "plot": "A high-profiled documentary about Hergè and his major cartoon success, Tintin, based in part on previously unreleased archive tape footage.", 9 "title": "Tintin and I", 10 "fullplot": "A high-profiled documentary about Hergè and his major cartoon success, Tintin, based in part on previously unreleased archive tape footage." 11 }, 12 { 13 "plot": "Bugs Bunny reflects on his past cartoon exploits.", 14 "title": "The Bugs Bunny/Road-Runner Movie", 15 "fullplot": "The Bugs Bunny/Road Runner Movie is basically a collection of Warner Bros. short cartoon features, \"starring\" the likes of Daffy Duck, Porky Pig and Wile.E.Coyote. These animations are interspersed by Bugs Bunny reminiscing on past events and providing links between the individual animations which are otherwise unconnected. The Road Runner feature at the end consists of sketches from various RR separates (well it does in the 78-minute version anyway)." 16 } 17 ]
In this case, we will be utilising the testIndex02 created with field mapping. Field mapping means static mapping has been used and not all fields are indexes. Only fields that are mentioned are indexed.
In our case, the field _genre _has been indexed and can be used with the below method:
1 public ArrayList<Document> searchMoviesWithGenre(String keyword){ 2 List<Document> pipeline = Arrays.asList(new Document("$search", 3 new Document("index", "testIndex02") 4 .append("text", 5 new Document("query", keyword) 6 .append("path", "genres"))), 7 new Document("$project", 8 new Document("_id", 0L) 9 .append("title", 1L) 10 .append("genres", 1L)), 11 new Document("$limit", 5L)); 12 ArrayList<Document> results = new ArrayList<>(); 13 collection.aggregate(pipeline).into(results); 14 return results; 15 }
For example, if you wish to list all the movies whose genre is action, you can make the REST call as:
1 curl -X GET "localhost:8080/searchMoviesWithGenre?query=action" | jq
This will give the output as:
1 [ 2 { 3 "genres": [ 4 "Action" 5 ], 6 "title": "The Perils of Pauline" 7 }, 8 { 9 "genres": [ 10 "Action" 11 ], 12 "title": "Legacy of Rage" 13 }, 14 { 15 "genres": [ 16 "Action" 17 ], 18 "title": "Batman Returns" 19 }, 20 { 21 "genres": [ 22 "Action" 23 ], 24 "title": "Dragon Inn" 25 }, 26 { 27 "genres": [ 28 "Action" 29 ], 30 "title": "Dikiy vostok" 31 } 32 ]
From the above example, we will be using the facets used by the searchMoviesAndCategorise method to perform a search on genres and categorise the data based on genres. Later, the testIndex02 also has numberFacet created on the year field, which will bucket all the movies released every 10 years and return the count for all movies.
1 public ArrayList<Document> searchMoviesAndCategorise(){ 2 List<Document> pipeline = Arrays.asList(new Document("$searchMeta", 3 new Document("index", "testIndex02") 4 .append("facet", 5 new Document("operator",new Document("text", 6 new Document("query", "movie").append("path", "title"))) 7 .append("facets",new Document("genresFacet", 8 new Document("type", "string").append("path", "genres")) 9 .append("yearFacet",new Document("type", "number") 10 .append("path", "year") 11 .append("boundaries", Arrays.asList(1990L, 2000L, 2010L, 2020L))))))); 12 ArrayList<Document> results = new ArrayList<>(); 13 collection.aggregate(pipeline).into(results); 14 return results; 15 }
For example, when you use the below API call, it will categorise the data based on genres first and then categorise on the number of movies released every 10 years to date.
1 curl -X GET "http://localhost:8080/searchMoviesAndCategorise" | jq
It will give the output as:
1 { 2 "count": { 3 "lowerBound": 125 4 }, 5 "facet": { 6 "genresFacet": { 7 "buckets": [ 8 { 9 "_id": "Comedy", 10 "count": 71 11 }, 12 { 13 "_id": "Animation", 14 "count": 42 15 }, 16 { 17 "_id": "Adventure", 18 "count": 40 19 }, 20 { 21 "_id": "Family", 22 "count": 29 23 }, 24 { 25 "_id": "Drama", 26 "count": 19 27 }, 28 { 29 "_id": "Action", 30 "count": 18 31 }, 32 { 33 "_id": "Documentary", 34 "count": 16 35 }, 36 { 37 "_id": "Horror", 38 "count": 9 39 }, 40 { 41 "_id": "Fantasy", 42 "count": 6 43 }, 44 { 45 "_id": "Crime", 46 "count": 5 47 } 48 ] 49 }, 50 "yearFacet": { 51 "buckets": [ 52 { 53 "_id": 1990, 54 "count": 19 55 }, 56 { 57 "_id": 2000, 58 "count": 62 59 }, 60 { 61 "_id": 2010, 62 "count": 27 63 } 64 ] 65 } 66 } 67 } 68 ]
The autocomplete feature helps you search using keywords that are spelt incompletely. This feature is helpful when a user wants to look for a book with an incomplete name on the e-commerce website. In this case, we will be making use of testIndex03 and outlook for movies where incomplete keywords for the fullplot fields have been mentioned.
The search function can be written as:
1 public ArrayList<Document> searchWithIncompleteKeyword(String keyword){ 2 List<Document> pipeline = Arrays.asList(new Document("$search", 3 new Document("index", "testIndex03") 4 .append("autocomplete", 5 new Document("query", keyword) 6 .append("path", "fullplot")) 7 .append("highlight", 8 new Document("path", "fullplot"))), 9 new Document("$project", 10 new Document("_id", 0L) 11 .append("fullplot", 1L) 12 .append("highlights", 13 new Document("$meta", "searchHighlights"))), 14 new Document("$limit", 1L)); 15 ArrayList<Document> results = new ArrayList<>(); 16 collection.aggregate(pipeline).into(results); 17 return results; 18 }
For example, if the fullplot field mentions Spa and Cow as incomplete keywords for Spain, Space, cowboys, etc. to test, you can search using the following API.
1 curl -X GET "http://localhost:8080/searchMoviesWithAutocomplete?query=spa%20cow" | jq
It will give a response as:
1 [ 2 { 3 "fullplot": "Julièn Torralba is a former movie stuntman in Almeria, Spain. He and several of his colleagues, who once made a living in American Westerns shot in Spain, now are reduced to doing stunt shows for minuscule audiences on the decaying set built for those old Westerns. Julièn wrestles with dark memories of the death of his son, also a stuntman, and with estrangement from his daughter-in-law Laura and her son Carlos. Carlos, a young boy, becomes intrigued with his late father's life and runs away to join Julièn and his band of has-beens. There Carlos is initiated into the rambunctious life of these hard-drinking faux cowboys. But when Laura, a powerful executive looking for a new site for a tourist resort, learns that Carlos has joined the hated Julièn, she moves to destroy even this remnant of Julièn's once-proud career. Julièn and the cowboys decide to fight back the only way they know how.", 4 "highlights": [ 5 { 6 "score": 2.126451015472412, 7 "path": "fullplot", 8 "texts": [ 9 { 10 "value": "Julièn Torralba is a former movie stuntman in Almeria, ", 11 "type": "text" 12 }, 13 { 14 "value": "Spain. He and", 15 "type": "hit" 16 } 17 ] 18 }, 19 { 20 "score": 1.518623948097229, 21 "path": "fullplot", 22 "texts": [ 23 { 24 "value": "He and several of his colleagues, who once made a living in American Westerns shot in ", 25 "type": "text" 26 }, 27 { 28 "value": "Spain, now are", 29 "type": "hit" 30 }, 31 { 32 "value": " reduced to doing stunt shows for minuscule audiences on the decaying set built for those old Westerns. ", 33 "type": "text" 34 } 35 ] 36 }, 37 { 38 "score": 1.869809865951538, 39 "path": "fullplot", 40 "texts": [ 41 { 42 "value": "There Carlos is initiated into the rambunctious life of these hard-drinking faux ", 43 "type": "text" 44 }, 45 { 46 "value": "cowboys. But when", 47 "type": "hit" 48 } 49 ] 50 }, 51 { 52 "score": 1.9461908340454102, 53 "path": "fullplot", 54 "texts": [ 55 { 56 "value": "Julièn and the ", 57 "type": "text" 58 }, 59 { 60 "value": "cowboys decide to", 61 "type": "hit" 62 }, 63 { 64 "value": " fight back the only way they know how.", 65 "type": "text" 66 } 67 ] 68 } 69 ] 70 } 71 ]
Sometimes, we search for items/products on websites using the wrong spelling due to autocorrect or unawareness of the right spelling, but the application still gives us the right results. This is possible because of the fuzzy search feature.
MongoDB’s Atlas Search also allows you to search through the text with the incorrect spelling.
1 public ArrayList<Document> searchWithMisspelledTitle(String keyword){ 2 List<Document> result = Arrays.asList(new Document("$search", 3 new Document("index", "testIndex03") 4 .append("text", 5 new Document("query", keyword) 6 .append("path", "title") 7 .append("fuzzy", 8 new Document("maxEdits", 2L) 9 .append("maxExpansions", 100L)))), 10 new Document("$project", 11 new Document("title", 1L) 12 .append("cast", 1L)), 13 new Document("$limit", 3L)); 14 return collection.aggregate(result).into(new ArrayList<>()); 15 }
In this case, you can search for the movie with a misspelled title, and the right movie name will appear in the results.
For example, if you search for the movie with the below REST call…
1 curl -X GET "localhost:8080/searchMoviesWithIncorrectSpelling?query=Gerti%20the%20Dinosor" | jq
…where the word Dinosour is wrongly spelt as Dinosor, it will give the results as:
1 [ 2 { 3 "_id": { 4 "timestamp": 1463423888, 5 "date": "2016-05-16T18:38:08.000+00:00" 6 }, 7 "cast": [ 8 "Winsor McCay", 9 "George McManus", 10 "Roy L. McCardell" 11 ], 12 "title": "Gertie the Dinosaur" 13 }, 14 { 15 "_id": { 16 "timestamp": 1463423899, 17 "date": "2016-05-16T18:38:19.000+00:00" 18 }, 19 "title": "Dinosaur", 20 "cast": [ 21 "D.B. Sweeney", 22 "Alfre Woodard", 23 "Ossie Davis", 24 "Max Casella" 25 ] 26 }, 27 { 28 "_id": { 29 "timestamp": 1463423974, 30 "date": "2016-05-16T18:39:34.000+00:00" 31 }, 32 "cast": [ 33 "Stan Adelstein", 34 "Lanice Archer", 35 "Robert Bakker", 36 "Philip Currie" 37 ], 38 "title": "Dinosaur 13" 39 } 40 ]
Similar to the case mentioned above, MongoDB Atlas Search also allows searching with synonym keywords — i.e., words that have the same or nearly the same meaning.
When we create the search index, we specify the collection as Synonyms Source Collection, from where the synonyms will be mapped. The testIndex04 mentions the collection name as test_synonyms, which is the source collection.
The search query is written as:
1 public ArrayList<Document> searchWithSynonyms(String keyword){ 2 List<Document> result = Arrays.asList(new Document("$search", 3 new Document("index", "testIndex04") 4 .append("text", 5 new Document("path", "fullplot") 6 .append("query", keyword) 7 .append("synonyms", "synonymName"))), 8 new Document("$limit", 10L), 9 new Document("$project", 10 new Document("_id", 0L) 11 .append("title", 1L) 12 .append("fullplot", 1L) 13 .append("score", 14 new Document("$meta", "searchScore"))), 15 new Document("$limit", 3L)); 16 return collection.aggregate(result).into(new ArrayList<>()); 17 }
The $meta used in the query with searchScore will determine how close the document is to the searched keyword. Before we do the rest call to test the query, we need to create the mapping for the synonyms in a different collection.
Insert the data below into the test_synonyms collection.
1 [ 2 { 3 mappingType: 'explicit', 4 input: [ 'love', 'romance' ], 5 synonyms: [ 'love', 'romance' ] 6 }, 7 { 8 mappingType: 'equivalent', 9 synonyms: [ 'car', 'vehicle', 'automobile' ] 10 } 11 ]
To test the above query, you can use the REST call as:
1 curl -X GET "localhost:8080/searchMoviesWithSynonyms?query=love" | jq
This will give results based on the mapping created, as shown below.
1 [ 2 { 3 "title": "Oh, Woe Is Me", 4 "fullplot": "Romance about Simon Donnadieu and his decision to leave his ever-loving wife Rachel.", 5 "score": 3.9741177558898926 6 }, 7 { 8 "title": "27 Missing Kisses", 9 "fullplot": "The summerly adventures of teen Sybilla, who falls in love with a middle-aged father, while being romanced by the his teen son.", 10 "score": 3.8209657669067383 11 }, 12 { 13 "title": "The End of the Affair", 14 "fullplot": "In wartorn London Maurice Bendrix falls in love with neighbor Sarah Miles. They begin an illicit romance behind Sarah's husband's back. While war does not last forever, neither does love in this existentialist tale.", 15 "score": 3.753319263458252 16 } 17 ]
The complete code for the application using all the cases mentioned above is available in the GitHub repository.
After you have utilised all the indexes created, you can use the below method to delete all the search indexes that have been created through the application.
1 public void deleteSearchIndexes(String dbName, String collectionName) { 2 MongoDatabase database = mongoClient.getDatabase(dbName); 3 MongoCollection<Document> collection = database.getCollection(collectionName); 4 5 List<String> indexNames = new ArrayList<>(); 6 for (Document indexInfo : collection.listSearchIndexes()) { 7 String indexName = indexInfo.getString("name"); 8 if (!"_id_".equals(indexName)) { 9 indexNames.add(indexName); 10 } 11 } 12 // Drop each index using its name from the array 13 for (String indexName : indexNames) { 14 collection.dropSearchIndex(indexName); 15 } 16 System.out.println("Deleted all indexes created"); 17 }
And call the API with the below command
1 curl -X DELETE "localhost:8080/deleteSearchIndexes?dbName=<dbName>&collectionName=<collectionName>"
In this second part of our series on exploring MongoDB Atlas Search with Spring Boot, we have delved into the practical applications of the search indexes we created in Part 1. By demonstrating various search scenarios, we highlighted the versatility and power of Atlas Search in handling different types of queries and data structures.
In this part, we have covered some use cases like autocomplete, fuzzy, field mapping, etc. If you wish to learn about more cases, we also have other articles — MongoDB Atlas Search Using the Java Driver and Spring Data and How to Build a Search Service in Java — where you can explore more use cases.
If you have any questions or suggestions, please feel free to reach out on the MongoDB community forum and also explore more fun tutorials through our MongoDB Developer Center.
Top Comments in Forums
There are no comments on this article yet.
This is part of a series