Visually Showing Atlas Search Highlights with JavaScript and HTML
Rate this tutorial
When it comes to finding specific words or phrases within text, you're probably going to want to use a natural language search option like full-text search (FTS). Sure, you could probably create a complicated and difficult-to-maintain set of regular expressions to search within text, but that is an option that most developers don't want. Not to mention it won't cover the full scope of what a natural language processor typically accomplishes.
In a previous tutorial titled Building an Autocomplete Form Element with Atlas Search and JavaScript, I wrote about searching for recipes, as they are being typed, in MongoDB Atlas using the
autocomplete
operator. While this tutorial accomplished the job quite well, it didn't elaborate on what exactly was being matched for any given term.In this tutorial, we're going to see how to use Atlas Search and work with the highlight data to visually show any matches on the terms in a user facing application. Highlighting is a powerful tool with Search to allow your users to find the exact text that they want in its proper context.
To get an idea of what we plan to accomplish, take a look at the following animated image:
In the above scenario, we are searching through messages in a chat room. When we enter a term to search, we get the chat messages in return, with any potential hits highlighted. The potential hits can match exactly, or they could have a certain level of fuzziness which we'll explore. In this particular example, the number of highlighted responses is limited to five.
Before we jump directly into the creation of the back end for searching and the front end for displaying, we need to have an idea of our data model. Let's assume we are working with user chat data and we want to search for certain words and phrases. With this in mind, our documents could potentially look like this:
1 { 2 "_id": "mongodb", 3 "messages": [ 4 { 5 "sender": "nraboy", 6 "message": "Hello World" 7 } 8 ] 9 }
The above document sample isn't the most realistic, but it gives us something. Every time a new message is added to the chat room, it is appended to the
messages
array with the associated sender information. We could make this significantly more complex, but we don't need to for this example.The next step is to create a default search index on our data collection. For this example, we'll be using a
gamedev
database and a chats
collection.While we could create an index specific to the fields we're planning to use, for simplicity, creating a dynamic default index will be more than enough. To do this, simply click on the green Create Search Index button. Let's accept the default settings and click Create Index. This will give us the default index with the following configuration:
1 { 2 "mappings": { 3 "dynamic": true 4 } 5 }
The
lucene.standard
analyzer is the default analyzer for Atlas Search and more information on it can be found in the documentation.In this example, we'll need a back end to handle the interaction with the database for searching. To keep the stack consistent for this example, we're going to use Node.js with some common dependencies.
Create a new directory on your computer and from the command line, execute the following:
1 npm init -y 2 npm install express mongodb cors --save
The above commands will create a new package.json file and download Express Framework, the MongoDB Node.js driver, and a cross-origin resource sharing middleware that will allow us to reach our back end from our front end operating on a different port.
Within the same project directory, create a main.js file and add the following boilerplate Express Framework with MongoDB code:
1 const { MongoClient, ObjectID } = require("mongodb"); 2 const Express = require("express"); 3 const Cors = require("cors"); 4 const { request } = require("express"); 5 6 const client = new MongoClient(process.env["ATLAS_URI"]); 7 const server = Express(); 8 9 server.use(Cors()); 10 11 var collection; 12 13 server.get("/search", async (request, response) => { }); 14 15 server.listen("3000", async () => { 16 try { 17 await client.connect(); 18 collection = client.db("gamedev").collection("chats"); 19 } catch (e) { 20 console.error(e); 21 } 22 });
In the above code, we are importing each of our dependencies and initializing Express Framework as well as MongoDB. The
ATLAS_URI
in the above example should be stored as an environment variable on your computer. You can obtain it from the MongoDB Atlas dashboard and it will look something like this:1 mongodb+srv://<username>:<password>@cluster0-yyarb.mongodb.net/<dbname>?retryWrites=true&w=majority
Of course, don't use my example above because you'll need to use your own cluster information. For help with getting started with MongoDB Atlas, check out my previous tutorial on the subject.
Take note of the section of the code where we are listening for connections:
1 server.listen("3000", async () => { 2 try { 3 await client.connect(); 4 collection = client.db("gamedev").collection("chats"); 5 } catch (e) { 6 console.error(e); 7 } 8 });
In the above code, we are connecting to the specified MongoDB Atlas cluster and we are obtaining a handle to the
chats
collection within the gamedev
database. Feel free to use your own collection and database naming, but note that this example will follow the previously defined data model when it comes to searching.With the boilerplate in place, let's jump into the
/search
endpoint that is currently empty. Instead, we're going to want to change it to the following:1 server.get("/search", async (request, response) => { 2 try { 3 let result = await collection.aggregate([ 4 { 5 "$search": { 6 "text": { 7 "query": `${request.query.term}`, 8 "path": "messages.message", 9 "fuzzy": { 10 "maxEdits": 2 11 } 12 }, 13 "highlight": { 14 "path": "messages.message" 15 } 16 } 17 }, 18 { 19 "$addFields": { 20 "highlights": { 21 "$meta": "searchHighlights" 22 } 23 } 24 } 25 ]).toArray(); 26 response.send(result); 27 } catch (e) { 28 response.status(500).send({ message: e.message }); 29 } 30 });
In the above endpoint code, we are creating an aggregation pipeline.
Because we plan to use Atlas Search, the
$search
operator needs to be the first stage in the pipeline. In this first stage, we are searching around a provided term. Rather than searching the entire document, we are searching within the message
object of the messages
array. The fuzzy
field with a maxEdits
of 2
defines the number of
single-character edits required to match the specified search term. For example, if we enter hlo
, we might get a hit on hello
, where as if we hadn't defined the fuzzy information, a hit might not be found. More information can be found in the documentation.The second stage of the pipeline will add the highlight data to the results before they are returned to the client. The highlight metadata isn't a part of the original document, hence the need to add it using the $meta operator prior to the response. You can read more about the
$meta
operator and the metadata it can surface in the
documentation. You could also use the $meta
operator in a $project
stage instead of $addFields
.Since this is a MongoDB aggregation pipeline, you can combine any number of aggregation operators, as long as
$search
is the first in the pipeline.If there's data in the collection, the application is ready to be used.
The next step is to display the search data on the screen. Most of what comes next is in regards to massaging the data into a format that we want to use, which includes visually highlighting the data with HTML markup.
We're going to need to create another project directory, this time representing the front end instead of the back end. Within this new directory, create an index.html file with the following markup:
1 2 <html> 3 <head></head> 4 <body> 5 <div> 6 <input id="term" type="text" /> 7 <button type="button" onclick="search()">Search</button> 8 </div> 9 <br /> 10 <div id="output"></div> 11 <script> 12 const search = async () => { 13 let term = document.getElementById("term"); 14 let output = document.getElementById("output"); 15 }; 16 </script> 17 </body> 18 </html>
In the above code, we have a form that calls a
search
function when the button is clicked. As of right now, the search
function only obtains the search term and references the area where search results should be output.Let's further narrow down what the
search
function should do.1 const search = async () => { 2 let term = document.getElementById("term"); 3 let output = document.getElementById("output"); 4 output.innerHTML = ""; 5 let result = await fetch("http://localhost:3000/search?term=" + term.value) 6 .then(response => response.json()); 7 result.forEach(chat => { 8 let messageContainer = document.createElement("div"); 9 messageContainer.innerHTML = `<strong>Chat ${chat._id}</strong>`; 10 chat.messages.forEach(msg => { 11 let message = document.createElement("p"); 12 chat.highlights.forEach(highlight => { 13 let texts = highlight.texts; 14 let replacements = texts.map(text => { 15 if(text.type == "hit") { 16 return "<mark>" + text.value + "</mark>"; 17 } else { 18 return text.value; 19 } 20 }).join(""); 21 let originals = texts.map(text => { 22 return text.value; 23 }).join(""); 24 msg.message = msg.message.replace(originals, replacements); 25 }); 26 message.innerHTML = msg.sender + ": " + msg.message; 27 messageContainer.appendChild(message); 28 }); 29 output.appendChild(messageContainer); 30 }); 31 };
The above modifications to the function might be a lot to take in. Let's break down what's happening.
After clearing the output space, we are making a request to the back end:
1 let result = await fetch("http://localhost:3000/search?term=" + term.value) 2 .then(response => response.json());
The results of that request will have the documents found as well as the highlight data associated to the search.
The next step will be to loop through each of the results and then each of the messages for the results. This is where things can become a bit confusing. MongoDB will return data that looks like the following when it comes to highlighting:
1 { 2 "path": "messages.message", 3 "texts": [ 4 { 5 "value": "This is another ", 6 "type": "text" 7 }, 8 { 9 "value": "Hello", 10 "type": "hit" 11 }, 12 { 13 "value": " world example", 14 "type": "text" 15 } 16 ], 17 "score": 0.7454098463058472 18 },
It doesn't exactly do the visual highlighting for us. Instead, it will tell us which term or phrase had a potential hit and the adjacent text. With this information, we need to highlight the hit in JavaScript.
1 let texts = highlight.texts; 2 let replacements = texts.map(text => { 3 if(text.type == "hit") { 4 return "<mark>" + text.value + "</mark>"; 5 } else { 6 return text.value; 7 } 8 }).join(""); 9 let originals = texts.map(text => { 10 return text.value; 11 }).join("");
Here, we are constructing a string from the original highlight pieces as well as a string where the hit is wrapped in markup. The goal is to use the
replace
function in JavaScript which requires a search term or phrase as well as the replacement. We can't just do a replace on the hit, because what if our hit was hello
while helloworld
existed in the chat with no spaces? The JavaScript replace doesn't look at words in a natural way, so blindly replacing on hello
would result in helloworld
being incorrectly highlighted. This is why we need to work with the adjacent data that MongoDB returns.After doing the JavaScript replacement with the original string and the modified string, we can prepare it for output with the following:
1 message.innerHTML = msg.sender + ": " + msg.message; 2 messageContainer.appendChild(message);
Like previously mentioned, the front end is really just doing a lot of visual manipulations using the result and highlight data that the back end came up with.
You just saw how to visually highlight search results on the screen using the highlight data returned with MongoDB Atlas Search. While highlighting the search hits with HTML markup and JavaScript isn't completely necessary, it is a great way to learn about your data and how your searches are operating.
To learn more about Atlas Search and building an autocomplete form, it's worth checking out my previous tutorial on the topic.