2 / 5
Dec 2023

Hello,

I am using Mongodb Vector database with LangChain. I would like to add a metadata to each documents
and use the metadata to filter the results.
Can someone guide me?

loader = WebBaseLoader( [ " http://mongodb.com " ] ) data = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=500) docs = text_splitter.split_documents(data) metadata = {"user-id": "your-user-id"} # Add Metadata to all docs here client = MongoClient(self.config.mongodb_uri) MONGODB_COLLECTION = client[self.config.vector_db_name][self.config.collection_name] MongoDBAtlasVectorSearch.from_documents( documents=docs, embedding=OpenAIEmbeddings(disallowed_special=()), collection=MONGODB_COLLECTION, index_name=self.config.search_index_name, metadata=metadata )

And in retrieval

# Add pre-filter here. vector_search = MongoDBAtlasVectorSearch.from_connection_string( self.config.mongodb_uri, self.config.vector_db_name + "." + self.config.collection_name, OpenAIEmbeddings(disallowed_special=()), index_name=self.config.search_index_name, ) retriever = vector_search.as_retriever()

Hello Meera,

Thanks for question. You can absolutely filter on metadata using Atlas Vector Search. The way you do this is by defining additional fields from your document that you’d like to filter on in the index.

This documentation shows how to setup that index and query with filters in the “Filter” example: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#examples

And, if you’re using Langchain, the documents here on Langchain also show how to use the filter in the Langchain syntax: MongoDB Atlas | 🦜️🔗 Langchain

Thanks!
I am working with Langchain, and the resource you provided worked for filtering the results for retrieval.

Followup question is:

How do I populate the vector database with custom metadata field ?
This is how I am adding the metadata

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) docs = text_splitter.split_documents(data) # Help me find a better way than iterating over all the documents for i, doc in enumerate(docs): doc.metadata["user_id"] = user_id MongoDBAtlasVectorSearch.from_documents( documents=docs, embedding=OpenAIEmbeddings(disallowed_special=()), collection=MONGODB_COLLECTION, index_name=self.config.search_index_name, )

Now for the retriever

10 days later
1 year later

Hello @Owais_Iqbal ,

I have more complex requirement for metadata. I want to store metadata filed which maps to a value as array of string.
To use above example:

docs = text_splitter.split_documents([data], metadatas = [{'"user_id"' : ["user_id1", "user_id2", "user_id3"]}] )