Hello everyone,
I am currently developing a question-answering system using Langchain’s RetrievalQA
, with MongoDBAtlasVectorSearch
for fetching documents from a MongoDB Atlas collection. I need to refine the document retrieval process to select specific documents that should be used as the basis for answering questions.
Here is the current setup of my retrieval function:
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import MongoDBAtlasVectorSearch
from langchain_openai import OpenAIEmbeddings
def create_vector_search(db_name, collection_name):
vector_search = MongoDBAtlasVectorSearch.from_connection_string(
"mongodb_connection_string",
f"{db_name}.{collection_name}",
OpenAIEmbeddings(),
index_name="vector_index"
)
return vector_search
def perform_question_answering(query):
vector_search = create_vector_search("langchain_db", "test")
qa_retriever = vector_search.as_retriever(
search_type="similarity",
search_kwargs={"k": 100, "post_filter_pipeline": [{"$limit": 1}]}
)
prompt_template = """
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=qa_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": PROMPT}
)
docs = qa({"query": query})
return docs["result"], docs['source_documents']
I attempted to use pre_filter
in search_kwargs
to limit the documents based on a specific condition (e.g., page
equals 555), but it didn’t work as expected.
qa_retriever = vector_search.as_retriever(
search_type="similarity",
search_kwargs={
"k": 100,
"pre_filter": {"page": {"$eq": 555}},
"post_filter_pipeline": [
{ "$limit": 1 }
]
}
)
So my question is how can I effectively use pre filter in MongoDBAtlasVectorSearch
to limit the retrieval to documents that match specific criteria?
I’m looking for insights, best practices, or examples that could help refine the document selection process within this system.
Thank you for your time and help!