How to Use Cohere Embeddings and Rerank Modules With MongoDB Atlas

Ashwin Gangadhar9 min read • Published Aug 14, 2024 • Updated Aug 14, 2024

Python Atlas

Rate this tutorial

The daunting task that developers currently face while developing solutions powered by the retrieval augmented generation (RAG) framework is the choice of retrieval mechanism. Augmenting the large language model (LLM) prompt with relevant and exhaustive information creates better responses from such systems. One is tasked with choosing the most appropriate embedding model in the case of semantic similarity search. Alternatively, in the case of full-text search implementation, you have to be thorough about your implementation to achieve a precise recall and high accuracy in your results. Sometimes, the solutions require a combined implementation that benefits from both retrieval mechanisms.

If your current full-text search scoring workflow is leaving things to be desired, or if you find yourself spending too much time writing numerous lines of code to get semantic search functionality working within your applications, then Cohere and MongoDB can help. To prevent these issues from holding you back from leveraging powerful AI search functionality or machine learning within your application, Cohere and MongoDB offer easy-to-use and fully managed solutions.

Cohere is an AI company specializing in large language models.

With a powerful tool for embedding natural language in their projects, it can help you represent more accurate, relevant, and engaging content as embeddings. The Cohere language model also offers a simple and intuitive API that allows you to easily integrate it with your existing workflows and platforms.
The Cohere Rerank module is a component of the Cohere natural language processing system that helps to select the best output from a set of candidates. The module uses a neural network to score each candidate based on its relevance, semantic similarity, theme, and style. The module then ranks the candidates according to their scores and returns the top N as the final output.

MongoDB Atlas is a fully managed developer data platform service that provides scalable, secure, and reliable data storage and access for your applications. One of the key features of MongoDB Atlas is the ability to perform vector search and full-text search on your data, which can enhance the capabilities of your AI/ML-driven applications. MongoDB Atlas can help you build powerful and flexible AI/ML-powered applications that can leverage both structured and unstructured data. You can easily create and manage search indexes, perform queries, and analyze results using MongoDB Atlas's intuitive interface, APIs, and drivers. MongoDB Atlas Vector Search provides a unique feature — pre-filtering and post-filtering on vector search queries — that helps users control the behavior of their vector search results, thereby improving the accuracy and retrieval performance, and saving money at the same time.

Therefore, with Cohere and MongoDB Atlas, we can demonstrate techniques where we can easily power a semantic search capability on your private dataset with very few lines of code. Additionally, you can enhance the existing ranking of your full-text search retrieval systems using the Cohere Rerank module. Both techniques are highly beneficial for building more complex GenAI applications, such as RAG- or LLM-powered summarization or data augmentation.

What will we do in this tutorial?

Store embeddings and prepare the index

Use the Cohere Embed Jobs to generate vector embeddings for the first time on large datasets in an asynchronous and scheduled manner.
Add vector embeddings into MongoDB Atlas, which can store and index these vector embeddings alongside your other operational/metadata.
Finally, prepare the indexes for both vector embeddings and full-text search on our private dataset.

Search with vector embeddings

Write a simple Python function to accept search terms/phrases and pass it through the Cohere embed API again to get a query vector.
Take these resultant query vector embeddings and perform a vector search query using the $vectorsearch operator in the MongoDB Aggregation Pipeline.
Pre-filter documents using meta information to narrow the search across your dataset, thereby speeding up the performance of vector search results while retaining accuracy.
The retrieved semantically similar documents can be post-filtered (relevancy score) to demonstrate a higher degree of control over the semantic search behaviour.

Search with text and Rerank with Cohere

Write a simple Python function to accept search terms/phrases and prepare a query using the $search operator and MongoDB Aggregation Pipeline.
Take these resultant documents and perform a reranking operation of the retrieved documents to achieve higher accuracy with full-text search results using the Cohere rerank module.

This will be a hands-on tutorial that will introduce you to how you can set up MongoDB with sample_movies dataset (the link to the file is in the code snippets). You’ll learn how to use the Cohere embedding jobs API to schedule a job to process all the documents as a batch job and update the dataset to add a new field by the name embedding that is stored alongside the other metadata/operational data. We will use this field to create a vector search index programmatically using the MongoDB Python drivers. Once we have created this index, we can then demonstrate how to query using the vector embedding as well as perform full-text search using the expressive and composable MongoDB Aggregation Pipeline (Query API).

Steps to initialize and run through the tutorial

Python dependencies

pandas: Helps with data preprocessing and handling
cohere: For embedding model and rerank module
pymongo: For the MongoDB Atlas vector store and full text search
s3fs : To load files directly from s3 bucket

Install all dependencies

The following line of code is to be run on Jupyter Notebook to install the required packages.

1 !pip install cohere==4.57 pymongo pandas s3fs

Initialize the Cohere API key and MongoDB connection string

If you have not created an API key on the Cohere platform, you can sign up for a Cohere account and create an API key, which you can generate from one of the following interfaces:

Cohere dashboard
Cohere CLI tool

Also, if you have not created a MongoDB Atlas instance for yourself, you can follow the tutorial to create one. This will provide you with your MONGODB_CONNECTION_STR.

Run the following lines of code in Jupyter Notebook to initialize the Cohere secret or API key and MongoDB Atlas connection string.

1 import os
2 import getpass
3 # cohere api key
4 try:
5     cohere_api_key = os.environ["COHERE_API_KEY"]
6 except KeyError:
7     cohere_api_key = getpass.getpass("Please enter your COHERE API KEY (hit enter): ")
8 
9 # MongoDB connection string
10 try:
11     MONGO_CONN_STR = os.environ["MONGODB_CONNECTION_STR"]
12 except KeyError:
13     MONGO_CONN = getpass.getpass("Please enter your MongoDB Atlas Connection String (hit enter): ")

Load dataset from the S3 bucket

Run the following lines of code in Jupyter Notebook to read data from an AWS S3 bucket directly to a pandas dataframe.

1 import pandas as pd
2 import s3fs
3 df = pd.read_json("s3://ashwin-partner-bucket/cohere/movies_sample_dataset.jsonl", orient="records", lines=True)
4 df.to_json("./movies_sample_dataset.jsonl", orient="records", lines=True)
5 df[:3]

Initialize and schedule the Cohere embeddings job to embed the "sample_movies" dataset

Here we will create a movies dataset in Cohere by uploading our sample movies dataset that we fetched from the S3 bucket and have stored locally. Once we have created a dataset, we can use the Cohere embed jobs API to schedule a batch job to embed all the entire dataset.

You can run the following lines of code in your Jupyter Notebook to upload your dataset to Cohere and schedule an embedding job.

1 import cohere  
2 co_client = cohere.Client(cohere_api_key, client_name='mongodb')
3 # create a dataset in Cohere Platform
4 dataset = co_client.create_dataset(name='movies',
5                                    data=open("./movies_sample_dataset.jsonl",'r'),
6                                    keep_fields=["overview","title","year"],
7                                    dataset_type="embed-input").wait()
8 dataset.wait()
9 dataset
10 
11 dataset.wait()
12 # Schedule an Embedding job to run on the entire movies dataset
13 embed_job = co_client.create_embed_job(dataset_id=dataset.id, 
14     input_type='search_document',
15     model='embed-english-v3.0', 
16     truncate='END')
17 embed_job.wait()
18 output_dataset = co_client.get_dataset(embed_job.output.id)
19 results = list(map(lambda x:{"text":x["text"], "embedding": x["embeddings"]["float"]},output_dataset))
20 len(results)

How to initialize MongoDB Atlas and insert data to a MongoDB collection

Now that we have created the vector embeddings for our sample movies dataset, we can initialize the MongoDB client and insert the documents into our collection of choice by running the following lines of code in the Jupyter Notebook.

1 from pymongo import MongoClient
2 mongo_client = MongoClient(MONGO_CONN_STR)
3 # Upload documents along with vector embeddings to MongoDB Atlas Collection
4 output_collection = mongo_client["sample_mflix"]["cohere_embed_movies"]
5 if output_collection.count_documents({})>0:
6     output_collection.delete_many({})
7 e = output_collection.insert_many(results)

Programmatically create vector search and full-text search index

With the latest update to the Pymongo Python package, you can now create your vector search index as well as full-text search indexes from the Python client itself. You can also create vector indexes using the MongoDB Atlas UI or mongosh.

Run the following lines of code in your Jupyter Notebook to create search and vector search indexes on your new collection.

1 output_collection.create_search_index({"definition":
2         {"mappings":
3          {"dynamic": true,
4           "fields": {
5             "embedding" : {
6                 "dimensions": 1024,
7                 "similarity": "cosine",
8                 "type": "vector"
9                 },
10             "fullplot":
11             }}},
12      "name": "default"
13     }
14 )

Query MongoDB vector index using $vectorSearch

MongoDB Atlas brings the flexibility of using vector search alongside full-text search filters. Additionally, you can apply range, string, and numeric filters using the aggregation pipeline. This allows the end user to control the behavior of the semantic search response from the search engine. The below lines of code will demonstrate how you can perform vector search along with pre-filtering on the year field to get movies earlier than 1990. Plus, you have better control over the relevance of returned results, so you can perform post-filtering on the response using the MongoDB Query API. In this demo, we are filtering on the score field generated as a result of performing the vector similarity between the query and respective documents, using a heuristic to retain only the accurate results.

Run the below lines of code in Jupyter Notebook to initialize a function that can help you achieve vector search + pre-filter + post-filter.

1 def query_vector_search(q, prefilter = {}, postfilter = {},path="embedding",topK=2):
2     ele = co_client.embed(model="embed-english-v3.0",input_type="search_query",texts=[q])
3     query_embedding = ele.embeddings[0]
4     vs_query = {
5                 "index": "default",
6                 "path": path,
7                 "queryVector": query_embedding,
8                 "numCandidates": 10,
9                 "limit": topK,
10             }
11     if len(prefilter)>0:
12         vs_query["filter"] = prefilter
13     new_search_query = {"$vectorSearch": vs_query}
14     project = {"$project": {"score": {"$meta": "vectorSearchScore"},"_id": 0,"title": 1, "release_date": 1, "overview": 1,"year": 1}}
15     if len(postfilter.keys())>0:
16         postFilter = {"$match":postfilter}
17         res = list(output_collection.aggregate([new_search_query, project, postFilter]))
18     else:
19         res = list(output_collection.aggregate([new_search_query, project]))
20     return res

Vector search query example

Run the below lines of code in Jupyter Notebook cell and you can see the following results.

1 query_vector_search("romantic comedy movies", topK=5)

Vector search query example with prefilter

1 query_vector_search("romantic comedy movies", prefilter={"year":{"$lt": 1990}}, topK=5)

Vector search query example with prefilter and postfilter to control the semantic search relevance and behaviour

1 query_vector_search("romantic comedy movies", prefilter={"year":{"$lt": 1990}}, postfilter={"score": {"$gt":0.76}},topK=5)

Leverage MongoDB Atlas full-text search with Cohere Rerank module

Cohere Rerank is a module in the Cohere suite of offerings that enhances the quality of search results by leveraging semantic search. This helps elevate the traditional search engine performance, which relies solely on keywords. Rerank goes a step further by ranking results retrieved from the search engine based on their semantic relevance to the input query. This pass of re-ranking search results helps achieve more appropriate and contextually similar search results.

To demonstrate how the Rerank module can be leveraged with MongoDB Atlas full-text search, we can follow along by running the following line of code in your Jupyter Notebook.

1 # sample search query using $search operator in aggregation pipeline
2 def query_fulltext_search(q,topK=25):
3     v = {"$search": {
4       "text": {
5         "query": q,
6         "path":"overview"
7       }
8     }}
9     project = {"$project": {"score": {"$meta": "searchScore"},"_id": 0,"title": 1, "release-date": 1, "overview": 1}}
10     docs = list(output_collection.aggregate([v,project, {"$limit":topK}]))
11     return docs
12 # results before re ranking
13 docs = query_fulltext_search("romantic comedy movies", topK=10)
14 docs

1 # After passing the search results through the Cohere rerank module
2 q = "romantic comedy movies"
3 docs = query_fulltext_search(q)
4 results = co_client.rerank(query=q, documents=list(map(lambda x:x["overview"], docs)), top_n=5, model='rerank-english-v2.0') # Change top_n to change the number of results returned. If top_n is not passed, all results will be returned.
5 for idx, r in enumerate(results):
6     print(f"Document Rank: {idx + 1}, Document Index: {r.index}")
7     print(f"Document Title: {docs[r.index]['title']}")
8     print(f"Document: {r.document['text']}")
9     print(f"Relevance Score: {r.relevance_score:.2f}")
10     print("\n")

Output post reranking the full-text search results:

1 Document Rank: 1, Document Index: 22
2 Document Title: Love Finds Andy Hardy
3 Document: A 1938 romantic comedy film which tells the story of a teenage boy who becomes entangled with three different girls all at the same time.
4 Relevance Score: 0.99
5 
6 
7 Document Rank: 2, Document Index: 12
8 Document Title: Seventh Heaven
9 Document: Seventh Heaven or De zevende zemel is a 1993 Dutch romantic comedy film directed by Jean-Paul Lilienfeld.
10 Relevance Score: 0.99
11 
12 
13 Document Rank: 3, Document Index: 19
14 Document Title: Shared Rooms
15 Document: A new romantic comedy feature film that brings together three interrelated tales of gay men seeking family, love and sex during the holiday season.
16 Relevance Score: 0.97
17 
18 
19 Document Rank: 4, Document Index: 3
20 Document Title: Too Many Husbands
21 Document: Romantic comedy adapted from a Somerset Maugham play.
22 Relevance Score: 0.97
23 
24 
25 Document Rank: 5, Document Index: 20
26 Document Title: Walking the Streets of Moscow
27 Document: "I Am Walking Along Moscow" aka "Ya Shagayu Po Moskve" (1963) is a charming lyrical comedy directed by Georgi Daneliya in 1963 that was nominated for Golden Palm at Cannes Film Festival. Daneliya proved that it is possible to create a masterpiece in the most difficult genre of romantic comedy. Made by the team of young and incredibly talented artists that besides Daneliya included writer/poet Gennady Shpalikov, composer Andrei Petrov, and cinematographer Vadim Yusov (who had made four films with Andrei Tarkovski), and the dream cast of the talented actors even in the smaller cameos, "I Am Walking Along Moscow" keeps walking victoriously through the decades remaining deservingly one of the best and most beloved Russian comedies and simply one of the best Russian movies ever made. Funny and gentle, dreamy and humorous, romantic and realistic, the film is blessed with the eternal youth and will always take to the walk on the streets of Moscow new generations of the grateful viewers.
28 Relevance Score: 0.96

Summary

In this tutorial, we were able to demonstrate the following:

Using the Cohere embedding along with MongoDB Vector Search, we were able to show how easy it is to achieve semantic search functionality alongside your operational data functions.
With Cohere Rerank, we were able to search results using full-text search capabilities in MongoDB and then rank them by semantic relevance, thereby delivering richer, more relevant results without replacing your existing search architecture setup.
The implementations were achieved with minimal lines of code and showcasing ease of use.
Leveraging Cohere Embeddings and Rerank does not need a team of ML experts to develop and maintain. So the monthly costs of maintenance were kept to a minimum.
Both solutions are cloud-agnostic and, hence, can be set up on any cloud platform.

The same can be found on a notebook which will help reduce the time and effort following the steps in this blog.

What's next?

To learn more about how MongoDB Atlas is helping build application-side ML integration in real-world applications, you can visit the MongoDB for AI page.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

Streaming Data from MongoDB to BigQuery Using Confluent Connectors

Jul 11, 2023 | 4 min read

Quickstart

Building RAG Pipelines With Haystack and MongoDB Atlas

Sep 18, 2024 | 4 min read

Article

Atlas Search is a Game Changer!

Sep 09, 2024 | 2 min read