I use langchain to query documents stored in mongodb, and I have created a search index. However, it seems that the time for each query is very long, over one second, according to the logs. Normal queries are mostly below a few tens of milliseconds. I want to know if this is normal.

My code:

Create vector store

vector_store = MongoDBAtlasVectorSearch(
    collection=collection,
    embedding=embedding,
    index_name=index_name,
)

# Create retriever
retriever = VectorStoreRetriever(
    vectorstore=vector_store,
    search_kwargs={
        "pre_filter": pre_filter,
        "k": k
    }
)

# Retrieve similar documents
docs = await retriever.ainvoke(query)

My instance is shared vcpu

The science of performance in search involves multiple layers of optimization across data storage, indexing, algorithm selection, memory management, and hardware utilization.
So, depends about all these factors to your question about “normal query time”. I recommend you to review these topics and maybe improve your query time.

Indexing is at the heart of efficient search performance. Avoid indexing unnecessary fields, as additional indexes can consume more memory and slow down operations.

Use Efficient Filters (pre_filter): Apply filters to limit the search space to only the most relevant documents. For example, if you only need recent documents, apply a date filter to reduce the size of the search space. Ensure that the pre_filter uses indexed fields to avoid full collection scans.

Approximate Nearest Neighbors (ANN): If exact precision is not required, using Approximate Nearest Neighbors (ANN) can drastically improve performance. ANN algorithms like HNSW are well-suited for high-dimensional vector searches and can return similar results faster than exact methods, making them ideal for large collections.

Upgrade to a Higher Tier: As you are using a shared vcpu, is a little bit weak to have a high workload. For larger collections and high-frequency queries, consider upgrading to a higher-tier instance in MongoDB Atlas. Higher tiers come with more RAM and CPU resources, which can improve index performance by reducing disk I/O.

I think I can meet the first three points. I only used one key for filtering, and it was done through indexing. And I don’t save any extra fields in the collection. There are only hundreds of documents in the collections.

Could you tell me what is the optimum query time and can the query time be seen in the atlas dashboard? Because the total time is combined with text to embedding and query from mongodb, so I don’t ensure which part cost the most of time.

Have you seen the docs about Analyze Slow Queries and Atlas Search Query Performance?

There is a several topics to help you with investigation process. I recommend you to upgrade your tier cluster to M10 or higher to have acccess for the query analytics and take a look in your search index behavior.