Leveraging Atlas Vector Search With HashiCorp Terraform: Empowering Semantic Search in Modern Applications

Rutuja Rajwade, Zuhair Ahmed4 min read • Published May 02, 2024 • Updated May 02, 2024

Terraform MongoDB

Rate this tutorial

Last year, MongoDB announced the general availability of Atlas Vector Search, a new capability in Atlas that allows developers to search across data stored in MongoDB based on its semantic meaning using high dimensional vectors (i.e., “embeddings”) created by machine learning models.

This allows developers to build intelligent applications that can understand and process human language in a way traditional, text-based search methods cannot since they will only produce an exact match for the query.

For example, searching for “warm winter jackets” on an e-commerce website that only supports text-based search might return products with the exact match keywords "warm," "winter," and "jackets." Vector search, on the other hand, understands the semantic meaning of "warm winter jackets'' as apparel designed for cold temperatures. It retrieves items that are not only labeled as "winter jackets” but are specifically designed for warmth, including products that might be described with related terms like "insulated," giving users more helpful search results.

Integrating Atlas Vector Search with infrastructure-as-code (IaC) tools like HashiCorp Terraform can then streamline and optimize your development workflows, ensuring that sophisticated search capabilities are built directly into the infrastructure deployment process.

This guide will walk you through how to get started with Atlas Vector Search through our HashiCorp Terraform Atlas provider. Let’s get started!

Pre-requisites

Create a MongoDB Atlas account.
Install HashiCorp Terraform on your terminal or sign up for a free Terraform Cloud account.
Create MongoDB Atlas programmatic API keys and associate them with Terraform.
Select an IDE of your choice. For this tutorial, we will be using VS Code.

Step 1: Deploy Atlas dedicated cluster with Atlas Search Nodes

First, we need to deploy basic Atlas resources to get started. This includes an Atlas project, an M10 dedicated Atlas cluster (which is pay-as-you-go, great for development and low-traffic applications), a database user, and an IP Access List Entry.

Note: When configuring your MongoDB Atlas cluster with Terraform, it's important to restrict IP access to only the IP address from which the Terraform script will be deployed. This minimizes the risk of unauthorized access.

In addition, as part of this tutorial, we will be using Atlas Search Nodes (optional). These provide dedicated infrastructure for Atlas Search and Vector Search workloads, allowing you to fully scale search independent of database needs. Incorporating Search Nodes into your Atlas deployment allows for better performance at scale and delivers workload isolation, higher availability, and the ability to optimize resource usage.

Lastly, when using Terraform to manage infrastructure, it is recommended to maintain organized file management practices. Typically, your Terraform configurations/scripts will be written in files with the .tf extension, such as main.tf. This file, which we are using in this tutorial, contains the primary configuration details for deploying resources and should be located ideally in a dedicated project directory on your local machine or on Terraform Cloud.

See the below Terraform script as part of our main.tf file:

1 terraform {
2   required_providers {
3     mongodbatlas = {
4       source = "mongodb/mongodbatlas"
5     }
6   }
7   required_version = ">= 0.13"
8 }
9 
10 resource "mongodbatlas_project" "exampleProject" {
11   name   = "exampleProject"
12   org_id = "63234d3234ec0946eedcd7da"
13 }
14 
15 resource "mongodbatlas_advanced_cluster" "exampleCluster" {
16   project_id   = mongodbatlas_project.exampleProject.id
17   name         = "ClusterExample"
18   cluster_type = "REPLICASET"
19 
20   replication_specs {
21     region_configs {
22       electable_specs {
23         instance_size = "M10"
24         node_count    = 3
25       }
26       provider_name = "AWS"
27       priority      = 7
28       region_name   = "US_EAST_1"
29     }
30   }
31 }
32 
33 resource "mongodbatlas_search_deployment" "exampleSearchNode" {
34   project_id   = mongodbatlas_project.exampleProject.id
35   cluster_name = mongodbatlas_advanced_cluster.exampleCluster.name
36   specs = [
37     {
38       instance_size = "S20_HIGHCPU_NVME"
39       node_count    = 2
40     }
41   ]
42 }
43 
44 resource "mongodbatlas_database_user" "testUser" {
45   username           = "username123"
46   password           = "password-test123"
47   project_id         = mongodbatlas_project.exampleProject.id
48   auth_database_name = "admin"
49 
50   roles {
51     role_name     = "readWrite"
52     database_name = "dbforApp"
53   }
54 }
55 
56 resource "mongodbatlas_project_ip_access_list" "test" {
57   project_id = mongodbatlas_project.exampleProject.id
58   ip_address = "174.218.210.1"
59 }

Note: Before deploying, be sure to store your MongoDB Atlas programmatic API keys created as part of the prerequisites as environment variables. To deploy, you can use the below commands from the terminal:

1 terraform init 
2 terraform plan
3 terraform apply

Step 2: Create your collections with vector data

For this tutorial, you can create your own collection of vectorized data if you have data to use.

Alternatively, you can use our sample data. This is great for testing purposes. The collection you can use is the "sample_mflix.embedded_movies" which already has embeddings generated by Open AI.

To use sample data, from the Atlas UI, go into the Atlas cluster Overview page and select “Atlas Search” at the top of the menu presented.

Then, click “Load a Sample Dataset.”

Step 3: Add vector search index in Terraform configuration

Now, head back over to Terraform and create an Atlas Search index with type “vectorSearch.” If you are using the sample data, also include a reference to the database “sample_mflix” and the collection “embedded_movies.”

Lastly, you will need to set the “fields” parameter as per our example below. See our documentation to learn more about how to index fields for vector search and the associated required parameters.

1 resource "mongodbatlas_search_index" "test-basic-search-vector" {
2   name   = "test-basic-search-index"  
3   project_id = mongodbatlas_project.exampleProject.id
4   cluster_name = mongodbatlas_advanced_cluster.exampleCluster.name
5   type = "vectorSearch"
6   database = "sample_mflix"
7   collection_name = "embedded_movies"
8   fields = <<-EOF
9     [{
10         "type": "vector",
11         "path": "plot_embedding",
12         "numDimensions": 1536,
13         "similarity": "euclidean"
14     }]
15     EOF
16 }

To deploy again, you can use the below commands from the terminal:

1 terraform init 
2 terraform plan
3 terraform apply

If your deployment was successful, you should be greeted with “Apply complete!”

To confirm, you should be able to see your newly created Atlas Search index resource in the Atlas UI with Index Type “vectorSearch” and Status as “ACTIVE.”

Step 4: Get connection string and connect to the MongoDB Shell to begin Atlas Vector Search queries

While still in the Atlas UI, go back to the homepage, click “Connect” on your Atlas cluster, and select “Shell.”

This will generate your connection string which you can use in the MongoDB Shell to connect to your Atlas cluster.

All done

Congratulations! You have everything that you need now to run your first Vector Search queries.

With the above steps, teams can leverage Atlas Vector Search indexes and dedicated Search Nodes for the Terraform MongoDB Atlas provider to build a retrieval-augmented generation, semantic search, or recommendation system with ease.

The HashiCorp Terraform Atlas provider is open-sourced under the Mozilla Public License v2.0 and we welcome community contributions. To learn more, see our contributing guidelines.

The fastest way to get started is to create a MongoDB Atlas account from the AWS Marketplace or Google Cloud Marketplace. To learn more about the Terraform provider, check out the documentation, solution brief, and tutorials, or get started today.

Go build with MongoDB Atlas and the HashiCorp Terraform Atlas provider today!

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Article

How to Optimize Java Performance With Virtual Threads, Reactive Programming, and MongoDB

Aug 29, 2024 | 5 min read

Tutorial

Leafsteroid Resources

Sep 09, 2024 | 1 min read

Tutorial

Enable Generative AI and Semantic Search Capabilities on Your Database With MongoDB Atlas and OpenAI

Sep 09, 2024 | 8 min read

Quickstart

Getting Started With MongoDB & Mongoose

Aug 05, 2024 | 9 min read

Step 1: Deploy Atlas dedicated cluster with Atlas Search Nodes
Step 2: Create your collections with vector data
Step 3: Add vector search index in Terraform configuration
Step 4: Get connection string and connect to the MongoDB Shell to begin Atlas Vector Search queries

MongoDB