Hello,
I’m using MongoDBAtlasVectorSearch
with Langchain’s RetrievalQA to fetch documents from my MongoDB Atlas collection, which is focused on MongoDB Atlas services. While the retrieval process generally works as expected, I’ve encountered some peculiar behavior.
Here’s the context: When querying topics directly related to the data in my collection, such as MongoDB Atlas features, the system performs well. However, when I query about unrelated topics, for example, “What is Google?”, it still returns documents, despite these topics being unrelated to the content of my database.
Moreover, I’ve noticed an unusual scenario where, after deleting all data from my collection, the RetrievalQA system still provided a relevant answer to the query “What is Google?”. This is perplexing because, with no data in the database, I expected no results or a response like “I don’t know.”
Here’s the code snippet used for the retrieval process:
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
def vector_search_from_connection_string(db_name, collection_name):
vector_search = MongoDBAtlasVectorSearch.from_connection_string(
"mongodb_connection_string",
f"{db_name}.{collection_name}",
OpenAIEmbeddings(),
index_name="vector_index"
)
return vector_search
def perform_question_answering(query):
vector_search = vector_search_from_connection_string("langchain_db", "test")
qa_retriever = vector_search.as_retriever(
search_type="similarity",
search_kwargs={"k": 100, "post_filter_pipeline": [{"$limit": 1}]}
)
prompt_template = """
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=qa_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": PROMPT}
)
docs = qa({"query": query})
return docs["result"], docs['source_documents']
output:
Are there best practices or additional configurations in MongoDB Atlas or Langchain that could help improve the accuracy of the search results, especially for unrelated queries?
Any insights, experiences, or recommendations on managing such search behavior would be greatly appreciated. I aim to refine the search results to ensure they are contextually relevant to the queries.
Thank you for your help and guidance!