Meera
(Meera Datey)
1
Hello,
I am using Mongodb Vector database with LangChain. I would like to add a metadata to each documents
and use the metadata to filter the results.
Can someone guide me?
loader = WebBaseLoader(
[ " http://mongodb.com "
]
)
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=500)
docs = text_splitter.split_documents(data)
metadata = {"user-id": "your-user-id"}
# Add Metadata to all docs here
client = MongoClient(self.config.mongodb_uri)
MONGODB_COLLECTION = client[self.config.vector_db_name][self.config.collection_name]
MongoDBAtlasVectorSearch.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(disallowed_special=()),
collection=MONGODB_COLLECTION,
index_name=self.config.search_index_name,
metadata=metadata
)
And in retrieval
# Add pre-filter here.
vector_search = MongoDBAtlasVectorSearch.from_connection_string(
self.config.mongodb_uri,
self.config.vector_db_name + "." + self.config.collection_name,
OpenAIEmbeddings(disallowed_special=()),
index_name=self.config.search_index_name,
)
retriever = vector_search.as_retriever()
Hello Meera,
Thanks for question. You can absolutely filter on metadata using Atlas Vector Search. The way you do this is by defining additional fields from your document that you’d like to filter on in the index.
This documentation shows how to setup that index and query with filters in the “Filter” example: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#examples
And, if you’re using Langchain, the documents here on Langchain also show how to use the filter in the Langchain syntax: MongoDB Atlas | 🦜️🔗 Langchain
Meera
(Meera Datey)
3
Thanks!
I am working with Langchain, and the resource you provided worked for filtering the results for retrieval.
Followup question is:
How do I populate the vector database with custom metadata field ?
This is how I am adding the metadata
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(data)
# Help me find a better way than iterating over all the documents
for i, doc in enumerate(docs):
doc.metadata["user_id"] = user_id
MongoDBAtlasVectorSearch.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(disallowed_special=()),
collection=MONGODB_COLLECTION,
index_name=self.config.search_index_name,
)
Now for the retriever
docs = text_splitter.split_documents([data], metadatas = [{'"user_id"' : user_id}] )
The above should do the job
Refer Split by character | 🦜️🔗 Langchain (Look for the metadata section)
1 Like
Hello @Owais_Iqbal ,
I have more complex requirement for metadata. I want to store metadata filed which maps to a value as array of string.
To use above example:
docs = text_splitter.split_documents([data], metadatas = [{'"user_id"' : ["user_id1", "user_id2", "user_id3"]}] )