Perform Hybrid Search with the LangChain Integration
On this page
You can integrate Atlas Vector Search with LangChain to perform hybrid search. In this tutorial, you complete the following steps:
Set up the environment.
Use Atlas as a vector store.
Create an Atlas Vector Search and Atlas Search index on your data.
Run hybrid search queries.
Pass the query results into your RAG pipeline.
Tip
Work with a runnable version of this tutorial as a Python notebook.
Prerequisites
To complete this tutorial, you must have the following:
An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.
An OpenAI API Key. You must have a paid OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.
An environment to run interactive Python notebooks such as Colab.
Note
If you're using Colab, ensure that your notebook session's IP address is included in your Atlas project's access list.
Set Up the Environment
Set up the environment for this tutorial.
Create an interactive Python notebook by saving a file
with the .ipynb
extension. This notebook allows you to
run Python code snippets individually, and you'll use
it to run the code in this tutorial.
To set up your notebook environment:
Set environmental variables.
Run the following code to set the environmental variables for this tutorial. Provide your OpenAI API Key and Atlas cluster's SRV connection string when prompted.
import os os.environ["OPENAI_API_KEY"] = "<api-key>" ATLAS_CONNECTION_STRING = "<connection-string>"
Note
Your connection string should use the following format:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
Use Atlas as a Vector Store
You must use Atlas as a vector store for your data. You can instantiate a vector store by using an existing collection in Atlas.
Load the sample data.
If you haven't already, complete the steps to load sample data into your Atlas cluster.
Instantiate the vector store.
Paste and run the following code in your notebook
to create a vector store instance
named vector_store
from the sample_mflix.embedded_movies
namespace in Atlas. This code uses the
from_connection_string
method to create the MongoDBAtlasVectorSearch
vector store and specifies the following parameters:
Your Atlas cluster's connection string.
An OpenAI embedding model as the model used to convert text into vector embeddings. By default, this model is
text-embedding-ada-002
.sample_mflix.embedded movies
as the namespace to use.plot
as the field that contains the text.plot_embedding
as the field that contains the embeddings.dotProduct
as the relevance score function.
from langchain_mongodb import MongoDBAtlasVectorSearch from langchain_openai import OpenAIEmbeddings # Create the vector store vector_store = MongoDBAtlasVectorSearch.from_connection_string( connection_string = ATLAS_CONNECTION_STRING, embedding = OpenAIEmbeddings(disallowed_special=()), namespace = "sample_mflix.embedded_movies", text_key = "plot", embedding_key = "plot_embedding", relevance_score_fn = "dotProduct" )
Create the Indexes
Note
To create Atlas Vector Search or Atlas Search indexes, you must have Project Data Access Admin
or higher access to the Atlas project.
To enable hybrid search queries on your vector store, create an Atlas Vector Search and Atlas Search index on the collection. You can create the indexes by using either the LangChain helper methods or the PyMongo Driver method:
Create the Atlas Vector Search index.
Run the following code to create
a vector search index that indexes the
plot_embedding
field in the collection.
# Use helper method to create the vector search index vector_store.create_vector_search_index( dimensions = 1536 )
Create the Atlas Search index.
Run the following code in your notebook to create a
search index
that indexes the plot
field in the collection.
from langchain_mongodb.index import create_fulltext_search_index from pymongo import MongoClient # Connect to your cluster client = MongoClient(ATLAS_CONNECTION_STRING) # Use helper method to create the search index create_fulltext_search_index( collection = client["sample_mflix"]["embedded_movies"], field = "plot", index_name = "search_index" )
Create the Atlas Vector Search index.
Run the following code to create
a vector search index that indexes the
plot_embedding
field in the collection.
from pymongo import MongoClient from pymongo.operations import SearchIndexModel # Connect to your cluster client = MongoClient(ATLAS_CONNECTION_STRING) collection = client["sample_mflix"]["embedded_movies"] # Create your vector search index model, then create the index vector_index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding", "numDimensions": 1536, "similarity": "dotProduct" } ] }, name="vector_index", type="vectorSearch" ) collection.create_search_index(model=vector_index_model)
Create the Atlas Search index.
Run the following code to create a
search index
that indexes the plot
field in the collection.
1 # Create your search index model, then create the search index 2 search_index_model = SearchIndexModel( 3 definition={ 4 "mappings": { 5 "dynamic": False, 6 "fields": { 7 "plot": { 8 "type": "string" 9 } 10 } 11 } 12 }, 13 name="search_index" 14 ) 15 collection.create_search_index(model=search_index_model)
The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.
Run a Hybrid Search Query
Once Atlas builds your indexes, you can run hybrid search queries on your data.
The following code uses the MongoDBAtlasHybridSearchRetriever
retriever to perform a hybrid search
for the string time travel
. It also
specifies the following parameters:
vectorstore
: The name of the vector store instance.search_index_name
: The name of the Atlas Search index.top_k
: The number of documents to return.fulltext_penalty
: The penalty for full-text search.The lower the penalty, the higher the full-text search score.
vector_penalty
: The penalty for vector search.The lower the penalty, the higher the vector search score.
The retriever returns a list of documents sorted by the sum of the full-text search score and the vector search score. The final output of the code example includes the title, plot, and the different scores for each document.
To learn more about hybrid search query results, see About the Query.
from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever # Initialize the retriever retriever = MongoDBAtlasHybridSearchRetriever( vectorstore = vector_store, search_index_name = "search_index", top_k = 5, fulltext_penalty = 50, vector_penalty = 50 ) # Define your query query = "time travel" # Print results documents = retriever.invoke(query) for doc in documents: print("Title: " + doc.metadata["title"]) print("Plot: " + doc.page_content) print("Search score: {}".format(doc.metadata["fulltext_score"])) print("Vector Search score: {}".format(doc.metadata["vector_score"])) print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))
Title: Timecop Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past. Search score: 0.019230769230769232 Vector Search score: 0.01818181818181818 Total score: 0.03741258741258741 Title: The Time Traveler's Wife Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage. Search score: 0.0196078431372549 Vector Search score: 0 Total score: 0.0196078431372549 Title: Thrill Seekers Plot: A reporter, learning of time travelers visiting 20th century disasters, tries to change the history they know by averting upcoming disasters. Search score: 0 Vector Search score: 0.0196078431372549 Total score: 0.0196078431372549 Title: About Time Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think. Search score: 0 Vector Search score: 0.019230769230769232 Total score: 0.019230769230769232 Title: My iz budushchego Plot: My iz budushchego, or We Are from the Future, is a movie about time travel. Four 21st century treasure seekers are transported back into the middle of a WWII battle in Russia. The movie's ... Search score: 0.018867924528301886 Vector Search score: 0 Total score: 0.018867924528301886
Pass Results to a RAG Pipeline
You can pass your hybrid search results into your RAG pipeline to generate responses on the retrieved documents. The sample code does the following:
Defines a LangChain prompt template to instruct the LLM to use the retrieved documents as context for your query. LangChain passes these documents to the
{context}
input variable and your query to the{query}
variable.Constructs a chain that specifies the following:
The hybrid search retriever you defined to retrieve relevant documents.
The prompt template that you defined.
An LLM from OpenAI to generate a context-aware response. By default, this is the
gpt-3.5-turbo
model.
Prompts the chain with a sample query and returns the response. The generated response might vary.
from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI # Define a prompt template template = """ Use the following pieces of context to answer the question at the end. {context} Question: Can you recommend some movies about {query}? """ prompt = PromptTemplate.from_template(template) model = ChatOpenAI() # Construct a chain to answer questions on your data chain = ( {"context": retriever, "query": RunnablePassthrough()} | prompt | model | StrOutputParser() ) # Prompt the chain query = "time travel" answer = chain.invoke(query) print(answer)
Based on the pieces of context provided, here are some movies about time travel that you may find interesting: 1. "Timecop" (1994) - A movie about a cop who is part of a law enforcement agency that regulates time travel, seeking justice and dealing with personal loss. 2. "The Time Traveler's Wife" (2009) - A romantic drama about a man with the ability to time travel involuntarily and the impact it has on his relationship with his wife. 3. "Thrill Seekers" (1999) - A movie about two reporters trying to prevent disasters by tracking down a time traveler witnessing major catastrophes. 4. "About Time" (2013) - A film about a man who discovers he can travel through time and uses this ability to improve his life and relationships. 5. "My iz budushchego" (2008) - A Russian movie where four treasure seekers from the 21st century are transported back to a WWII battle, exploring themes of action, drama, fantasy, and romance. These movies offer a variety of perspectives on time travel and its impact on individuals and society.