I am trying to build a simple RAG using LangChain that will answer some basic questions regarding a list of authors and their books but they should be split based on author_gender, if I select ''female" I should have only the answers only from books that were written by female authors and the same for “male”.
Data looks something like this:
{"text":"{"title": "Pride and Prejudice", "author": "Jane Austen"}",
"author_gender":"female",
"publication_year":{"1813"}}
Below Is the code that I am using, is a simple filter within the retriever but it looks like it does absolutely nothing, results are the same using a filter or not within the retriever.
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_search.as_retriever(filter={"author_gender": "female"}))
question = "List novels written by female authors"
result = qa({"query": question})
print(result["result"])
Based on the provided context, the novels written by male authors are:
- "The Great Gatsby" by F. Scott Fitzgerald
- "To Kill a Mockingbird" by Harper Lee
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_search.as_retriever())
question = "List novels written by male authors"
result = qa({"query": question})
print(result["result"])
Based on the provided context, the novels written by male authors are:
- "The Great Gatsby" by F. Scott Fitzgerald
- "To Kill a Mockingbird" by Harper Lee
Is there any way to have a stable way of filtering the results based on the available metadata information?
Library used:
langchain==0.2.6
pymongo==4.8.0
System version: 3.10
Hey, thanks for sharing this issue.
Based on the prompt response it looks like it’s not fully interpreting your question correctly as it still isn’t specifying male/female in the answer. However, you are right in believing the filter addition should mitigate the available options nonetheless.
Could you confirm you are using the langchain-mongodb package? If so, you should be able to mitigate your issue by putting the filter as a pre_filter argument within search_kwargs. Here’s an example of how to add kwargs when calling as_retriever.
For your case it, your code would be revised to look like this:
_search_kwargs = {"pre_filter": {"author_gender": "female"}}
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vector_search.as_retriever(search_kwargs=_search_kwargs)
)
question = "List novels written by female authors"
result = qa({"query": question})
print(result["result"])
1 Like
I confirm that I am using langchain-mongodb package
langchain-mongodb==0.1.8
The pre_filter works like a charm, is filtering the results as expected:
_search_kwargs = {"pre_filter": {"author_gender": "male"}}
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vector_store.as_retriever(search_kwargs=_search_kwargs)
)
question = "List novels with authors name"
result = qa({"query": question})
print(result["result"])
Here are the novels with their respective authors' names:
1. "The Great Gatsby" by F. Scott Fitzgerald
2. "Moby-Dick" by Herman Melville
3. "The Catcher in the Rye" by J.D. Salinger
Thank you very much for your help.