Hello,
I created an Vector Search Index in my Atlas cluster, on the “embedding” field of a “embeddings” collection. It works well.
Now I want to filter the results to only retrieve entries for a specific “project”. I use LangChain, and the MongoDBAtlasVectorSearch as a retriever. In the documentation it says I can add the filter, as explained here.
My code:
from langchain.vectorstores import MongoDBAtlasVectorSearch
vectorstore = MongoDBAtlasVectorSearch(
collection=db.embeddings,
embedding=get_embedding("azureopenai"),
index_name="embedding_index")
retriever = vectorstore.as_retriever(
search_kwargs={
'k': 5,
'filter': { 'project': 'heroes' }
}
)
I then use the retriever in a LangChain chain. I got results (5, as expected), but the filter does not work, I got results from all projects (not only the ‘heroes’ project).
Other info for context:
Here is the index (I also added an index on the ‘project’ field, but it does change the results):
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"dimensions": 1536,
"similarity": "cosine",
"type": "knnVector"
},
"project": {
"type": "string"
}
}
}
}
And here is an example of a document stored in the ‘embeddings’ collection:
{
"_id": {
"$oid": "64e379206cfcf8a7866bce8c"
},
"text": "Spider-Man, créé par Stan Lee et Steve Ditko, est un super-héros de Marvel Comics. Peter\nParker, un étudiant doué mais timide, est mordu par une araignée radioactive ...",
"embedding": [
0.0013639614901196446,
-0.02883271683320636,
0.014490925689774099,
-0.012036416665376559,
....
],
"source": "uploads/heroes/spiderman-short.pdf",
"file": "spiderman-short.pdf",
"project": "heroes"
}
Any hints or solutions?
Thanks a lot