Adding MongoDB Atlas Vector Search to a .NET Blazor C# Application
Rate this tutorial
When was the last time you could remember the rough details of something but couldn’t remember the name of it? That happens to quite a few people, so being able to search semantically instead of with exact text searches is really important.
This is where MongoDB Atlas Vector Search comes in useful. It allows you to perform semantic searches against vector embeddings in your documents stored inside MongoDB Atlas. Because the embeddings are stored inside Atlas, you can create the embeddings against any type of data, both structured and unstructured.
In this tutorial, you will learn how to add vector search with MongoDB Atlas Vector Search, using the MongoDB C# driver, to a .NET Blazor application. The Blazor application uses the sample_mflix database, available in the sample dataset anyone can load into their Atlas cluster. You will add support for searching semantically against the plot field, to find any movies that might fit the plot entered into the search box.
In order to follow along with this tutorial, you will need a few things in place before you start:
- .NET 8 SDK installed on your computer
- An IDE or text editor that can support C# and Blazor for the most seamless development experience, such as Visual Studio, Visual Studio Code with the C# DevKit Extension installed, or JetBrains Rider
- Your cluster connection string
- A local copy of the Hugging Face Dataset Upload tool
- An OpenAI account and a free API key generated — you will use the OpenAI API to create a vector embedding for our search term
Once you have forked and then cloned the repo and have it locally, you will need to add your connection string into
appsettings.Development.json
and appsettings.json
in the placeholder section in order to connect to your cluster when running the project.If you don’t want to follow along, the repo has a branch called “vector-search” which has the final result implemented. However, you will need to ensure you have the embedded data in your Atlas cluster.
The first thing you need is some data stored in your cluster that has vector embeddings available as a field in your documents. MongoDB has already provided a version of the movies collection from sample_mflix, called embedded_movies, which has 1500 documents, using a subset of the main movies collection which has been uploaded as a dataset to Hugging Face that will be used in this tutorial.
This is where the Hugging Face Dataset Uploader downloaded as part of the prerequisites comes in. By running this tool using
dotnet run
at the root of the project, and passing your connection string into the console when asked, it will go ahead and download the dataset from Hugging Face and then upload that into an embedded_movies
collection inside the sample_mflix
database. If you haven’t got the same dataset loaded so this database is missing, it will even just create it for you thanks to the C# driver!You can generate vector embeddings for your own data using tools such as Hugging Face, OpenAI, LlamaIndex, and others. You can read more about generating embeddings using open-source models by reading a tutorial from Prakul Agarwal on Generative AI, Vector Search, and open-source models here on Developer Center.
Now you have a collection of movie documents with a
plot_embedding
field of vector embeddings for each document, it is time to create the Atlas Vector Search index. This is to enable vector search capabilities on the cluster and to let MongoDB know where to find the vector embeddings.- Inside Atlas, click “Browse Collections” to open the data explorer to view your newly loaded sample_mflix database.
- Select the “Atlas Search” tab at the top.
- Click the green “Create Search Index” button to load the index creation wizard.
- Select JSON Editor under the Vector Search heading and then click “Next.”
- Select the embedded_movies collection under sample_mflix from the left.
- The name doesn’t matter hugely here, as long as you remember it for later but for now, leave it as the default value of ‘vector_index’.
- Copy and paste the following JSON in, replacing the current contents of the box in the wizard:
1 { 2 "fields": [ 3 { 4 "type": "vector", 5 "path": "plot_embedding", 6 "numDimensions": 1536, 7 "similarity": "dotProduct" 8 } 9 ] 10 }
This contains a few fields you might not have seen before.
- path is the name of the field that contains the embeddings. In the case of the dataset from Hugging Face, this is plot_embedding.
- numDimensions refers to the dimensions of the model used.
- similarity refers to the type of function used to find similar results.
Click “Next” and on the next page, click “Create Search Index.”
After a couple of minutes, the vector search index will be set up, you will be notified by email, and the application will be ready to have vector search added.
You have the data with plot embeddings and a vector search index created against that field, so it is time to start work on the application to add search, starting with the backend functionality.
The OpenAI API key will be used to request embeddings from the API for the search term entered since vector search understands numbers and not text. For this reason, the application needs your OpenAI API key to be stored for use later.
- Add the following into the root of your
appsettings.Development.json
andappsettings.json
, after the MongoDB section, replacing the placeholder text with your own key:
1 "OpenAPIKey": "<YOUR OPENAI API KEY>"
- Inside
program.cs
, after the creation of the var builder, add the following line of code to pull in the value from app config:
1 var openAPIKey = builder.Configuration.GetValue<string>("OpenAPIKey");
- Change the code that creates the MongoDBService instance to also pass in the
openAPIKey variable
. You will change the constructor of the class later to make use of this.
1 builder.Services.AddScoped(service => new MongoDBService(mongoDBSettings, openAPIKey));
You will need to add a new method to the interface that supports search, taking in the term to be searched against and returning a list of movies that were found from the search.
Open
IMongoDBService.cs
and add the following code:1 public IEnumerable<Movie> MovieSearch(string textToSearch);
Now to make the changes to the implementation class to support the search.
- Open
MongoDBService.cs
and add the following using statements to the top of the file:
1 using System.Text; 2 using System.Text.Json;
- Add the following new local variables below the existing ones at the top of the class:
1 private readonly string _openAPIKey; 2 private readonly HttpClient _httpClient = new HttpClient();
- Update the constructor to take the new openAPIKey string parameter, as well as the MongoDBSettings parameter. It should look like this:
1 public MongoDBService(MongoDBSettings settings, string openAPIKey)
- Inside the constructor, add a new line to assign the value of openAPIKey to _openAPIKey.
- Also inside the constructor, update the collection name from “movies” to “embedded_movies” where it calls
.GetCollection
.
The following is what the completed constructor should look like:
1 public MongoDBService(MongoDBSettings settings, string openAPIKey) 2 { 3 _client = new MongoClient(settings.AtlasURI); 4 _mongoDatabase = _client.GetDatabase(settings.DatabaseName); 5 _movies = _mongoDatabase.GetCollection<Movie>("embedded_movies"); 6 _openAPIKey = openAPIKey; 7 }
The C# driver acts as an object document mapper (ODM), taking care of mapping between a plain old C# object (POCO) that is used in C# and the documents in your collection.
However, the existing movie model fields need updating to match the documents inside your embedded_movies collection.
Replace the contents of
Models/Movie.cs
with the following code:1 using MongoDB.Bson; 2 using MongoDB.Bson.Serialization.Attributes; 3 4 namespace SeeSharpMovies.Models; 5 6 7 public class Movie 8 { 9 [ ]10 [ ]11 public ObjectId Id { get; set; } 12 13 [ ]14 public string Plot { get; set; } 15 16 [ ] 17 public string[] Genres { get; set; } 18 19 [ ]20 public int Runtime { get; set; } 21 22 [ ]23 public string[] Cast { get; set; } 24 25 [ ]26 public int NumMflixComments { get; set; } 27 28 [ ]29 public string Poster { get; set; } 30 31 [ ]32 public string Title { get; set; } 33 34 [ ]35 public string FullPlot { get; set; } 36 37 [ ]38 public string[] Languages { get; set; } 39 40 [ ]41 public string[] Directors { get; set; } 42 43 [ ]44 public string[] Writers { get; set; } 45 46 [ ]47 public Awards Awards { get; set; } 48 49 [ ]50 public string Year { get; set; } 51 52 [ ]53 public Imdb Imdb { get; set; } 54 55 [ ]56 public string[] Countries { get; set; } 57 58 [ ]59 public string Type { get; set; } 60 61 [ ]62 public float[] PlotEmbedding { get; set; } 63 64 } 65 66 public class Awards 67 { 68 [ ]69 public int Wins { get; set; } 70 71 [ ]72 public int Nominations { get; set; } 73 74 [ ]75 public string Text { get; set; } 76 } 77 78 public class Imdb 79 { 80 [ ]81 public float Rating { get; set; } 82 83 [ ]84 public int Votes { get; set; } 85 86 [ ]87 public int Id { get; set; } 88 }
This contains properties for all the fields in the document, as well as classes and properties representing subdocuments found inside the movie document, such as “critic.” You will also note the use of the BsonElement attribute, which tells the driver how to map between the field names and the property names due to their differing naming conventions.
It is almost time to start implementing the search on the back end. When calling the OpenAI API’s embedding endpoint, you will get back a lot of data, including the embeddings. The easiest way to handle this is to create an EmbeddingResponse.cs class that models this response for use later.
Add a new class called EmbeddingResponse inside the Model folder and replace the contents of the file with the following:
1 namespace SeeSharpMovies.Models 2 { 3 public class EmbeddingResponse 4 { 5 public string @object { get; set; } 6 public List<Data> data { get; set; } 7 public string model { get; set; } 8 public Usage usage { get; set; } 9 } 10 11 public class Data 12 { 13 public string @object { get; set; } 14 public int index { get; set; } 15 public List<double> embedding { get; set; } 16 } 17 18 public class Usage 19 { 20 public int prompt_tokens { get; set; } 21 public int total_tokens { get; set; } 22 } 23 }
It is time to make use of the API key for OpenAI and write functionality to create vector embeddings for the searched term by calling the OpenAI API Embeddings endpoint.
Inside
MongoDBService.cs
, add the following code:1 private async Task<List<double>> GetEmbeddingsFromText(string text) 2 { 3 Dictionary<string, object> body = new Dictionary<string, object> 4 { 5 { "model", "text-embedding-ada-002" }, 6 { "input", text } 7 }; 8 9 _httpClient.BaseAddress = new Uri("https://api.openai.com"); 10 _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {_openAPIKey}"); 11 12 string requestBody = JsonSerializer.Serialize(body); 13 StringContent requestContent = 14 new StringContent(requestBody, Encoding.UTF8, "application/json"); 15 16 var response = await _httpClient.PostAsync("/v1/embeddings", requestContent) 17 .ConfigureAwait(false); 18 19 if (response.IsSuccessStatusCode) 20 { 21 string responseBody = await response.Content.ReadAsStringAsync(); 22 EmbeddingResponse embeddingResponse = JsonSerializer.Deserialize<EmbeddingResponse>(responseBody); 23 return embeddingResponse.data[0].embedding; 24 } 25 26 return new List<double>(); 27 }
The body dictionary is needed by the API to know the model used and what the input is. The text-embedding-ada-002 model is the default text embedding model.
The GetEmbeddingsFromText method returned the embeddings for the search term, so now it is available to be used by Atlas Vector Search and the C# driver.
Paste the following code to implement the search:
1 public IEnumerable<Movie> MovieSearch(string textToSearch) 2 { 3 4 var vector = GetEmbeddingsFromText(textToSearch).Result.ToArray(); 5 6 var vectorOptions = new VectorSearchOptions<Movie>() 7 { 8 IndexName = "vector_index", 9 NumberOfCandidates = 150 10 }; 11 12 var movies = _movies.Aggregate() 13 .VectorSearch(movie => movie.PlotEmbedding, vector, 150, vectorOptions) 14 .Project<Movie>(Builders<Movie>.Projection 15 .Include(m => m.Title) 16 .Include(m => m.Plot) 17 .Include(m => m.Poster)) 18 .ToList(); 19 20 21 return movies; 22 }
If you chose a different name when creating the vector search index earlier, make sure to update this line inside vectorOptions.
Vector search is available inside the C# driver as part of the aggregation pipeline. It takes four arguments: the field name with the embeddings, the vector embeddings of the searched term, the number of results to return, and the vector options.
Further methods are then chained on to specify what fields to return from the resulting documents.
Because the movie document has changed slightly, the current code inside the
GetMovieById
method is no longer correct.Replace the current line that calls
.Find
with the following:1 var movie = _movies.Find(movie => movie.Id.ToString() == id).FirstOrDefault();
The back end is now complete and it is time to move on to the front end, adding the ability to search on the UI and sending that search back to the code we just wrote.
The frontend functionality will be split into two parts: the code in the front end for talking to the back end, and the search bar in HTML for typing into.
As this is an existing application, there is already code available for pulling down the movies and even pagination. This is where you will be adding the search functionality, and it can be found inside
Home.razor
in the Components/Pages
folder.- Inside the
@code
block, add a new string variable for searchTerm:
1 string searchTerm;
- Paste the following new method into the code block:
1 private void SearchMovies() 2 { 3 if (string.IsNullOrWhiteSpace(searchTerm)) 4 { 5 movies = MongoDBService.GetAllMovies(); 6 } 7 else 8 { 9 movies = MongoDBService.MovieSearch(searchTerm); 10 } 11 }
This is quite straightforward. If the searchTerm string is empty, then show everything. Otherwise, search on it.
Adding the search bar is really simple. It will be added to the header component already present on the home page.
Replace the existing header tag with the following HTML:
1 <header class="top-bar"> 2 <a href="/">See Sharp Movies</a> 3 <div class="form-inline search-bar"> 4 <input class="form-control mr-sm-2" 5 type="search" placeholder="Search" 6 aria-label="Search" 7 @bind="searchTerm"> 8 <button class="btn btn-outline-success my-2 my-sm-0" @onclick="SearchMovies">Search</button> 9 </div> 10 </header>
This creates a search input with the value being bound to the searchTerm string and a button that, when clicked, calls the SearchMovies method you just called.
At this point, the functionality is implemented. But if you ran it now, the search bar would be in a strange place in the header, so let’s fix that, just for prettiness.
Inside
wwwroot/app.css
, add the following code:1 .search-bar { 2 padding: 5%; 3 } 4 5 .search-bar button { 6 padding: 4px; 7 }
This just gives the search bar and the button a bit of padding to make it position more nicely within the header. Although it’s not perfect, CSS is definitely not my strong suit. C# is my favorite language!
Woohoo! We have the backend and frontend functionality implemented, so now it is time to run the application and see it in action!
Run the application, enter a search term in the box, click the “Search” button, and see what movies have plots semantically close to your search term.
Amazing! You now have a working Blazor application with the ability to search the plot by meaning instead of exact text. This is also a great starting point for implementing more vector search capabilities into your application.
If you want to learn more about Atlas Vector Search, you can read our documentation.
MongoDB also has a space on Hugging Face where you can see some further examples of what can be done and even play with it. Give it a go!
There is also an amazing article on using Vector Search for audio co-written by Lead Developer Advocate at MongoDB Pavel Duchovny.
Top Comments in Forums
There are no comments on this article yet.
Related
Tutorial
Query Your Data With ASP.NET Core, OData, and the MongoDB Entity Framework Core Provider
Jul 08, 2024 | 7 min read
Tutorial
Building a Space Shooter Game in Unity that Syncs with Realm and MongoDB Atlas
Jun 26, 2024 | 24 min read