How to get access to ONLY the object in sub-document where there was a hit (match) in Atlas Search

Arjun_kochhar_leodeo · 2024-07-01T06:15:40.350Z

Hello All:

I am stuck with something that should be very simple to do, but am not able to find any straightforward way of doing using Mongo DB Atlas Search.

Firstly, the MongoDb videos said that the best way to structure no-sql databases like mongo db is not separating data into separate collections and linking via foreign keys, but by embedding and creating sub-documents, so sub-documents (or nested documents) are the recommended data structure of choice for Mongo db.

Now I have a USERS collection and this contains sub collections of related user data. I use the Atlas Search to do a text search on field in a nested or sub -document in the USER collection and using PROJECT I am able to get the fields for the USER where there was a match in the sub-document. However I cannot for the life of me get the _id of the sub document where the actual match took place!

In the Highlights I get the matched text. But i need the _id of the sub-document where the match happened. This should be very straightforward to get in any database system.

I did some research and in other threads there was a recommendation to do some kind of nasty workaround like $unwind the search results and do another $match on them. However I have multiple hits with the same or similar keyword (given the fuzzy aspect of the text search of Atlas search) and the subsequent running of $match as a workaround will result in serious bugs as other records could be matched rather than the one that was initially identified by the $Search.

I am pasting the aggregation code below:

[
       {
         '$search': {
          'index': "indexDeos",
           'text': {
             'query': 'trumpet', 
             'path': 'deos.text',
             'fuzzy': {}
           }, 
           'highlight': {
             'path': 'deos.text'
           }
         }
       }, {
         '$limit': 10
       }, {
         '$project': {
           '_id': 1, 
           'firstname': 1,
           'lastname': 1, 
           'thumbnail': 1,
           'deos': 1,
           'highlights': {
             '$meta': 'searchHighlights'
           }, 
           'score': {
             '$meta': 'searchScore'
           }
         }
       }
     ]

So the query is for the keyword “trumpet” this happens in a subdocument called deos.text. In my projection I have deos listed, and so I get all the deos back, including multiple ‘trumpets’, ‘trampets’ and so on, but I cannot find a way to get the _id of the record where there was the initial hit in the $search.

Please help. To get the _id of the document where the $search found a hit should be elementary.

Erik_Hatcher · 2024-07-01T15:03:38.808Z

First to dispel the myth that best way is to embed sub-documents - this is entirely dependent on the use case / workload. Data modeling and schema design in MongoDB are based on the details of how it works internally and what’s needed from the aggregation pipeline. With search, the main collection documents are what are ultimately matched and returned - not sub-documents independently. Consider making deos sub-documents their own collection and searching those instead - what would that look like scale-wise, aggregation pipeline-wise, performance-wise?

From your query, I’m deducing you’re using the document type for deos sub-documents. Could you share your index configuration too?

What’s the scale you’re working with? How many main collection documents? What’s the average number of deos sub-documents per main collection document?

amyjian · 2024-07-01T15:26:18.884Z

@Arjun_kochhar_leodeo it sounds like you’d like to return only the matching sub-documents of the array, is that right? As Erik mentioned, Atlas Search matches and returns parents documents only. +1 on being interested about the scale of your use case.

To request this functionality and get alerted on updates, I recommend voting on this item in our feedback portal. You can see in the most recent comment that it’s possible to use a similar or equivalent $filter expression to return matching sub-documents – I recreated an example that might work for you here: Atlas Search Playground

Arjun_kochhar_leodeo · 2024-08-29T06:49:47.266Z

Hello @amyjian @Erik_Hatcher :

This deserves some discussion. I came from the relational database world (Postgres/Mysql) Data is accessed via joining tables, which are separated using the forms of normalization. Coming to MongoDb, I watched the Data Modelling videos that MongoDB officially provides. These make a VERY STRONG case for not using relationships between collections/tables, rather embedding related tables as sub-documents.

The use case of a database is to store and access data. If following this modelling video’s advice I set up my data as sub-documents, and cannot search and get the id of a matching sub-document, then we have a serious problem. Either Mongo should not recommend sub-documents as the preferred data modelling design, or it should provide the ability to search sub documents.

In any case, now am not sure what to do. Here is my index and this index belongs to the Users Collection. Therefore the ‘deos’ is a sub-document in the Users collection, and it contains a field called ‘text’ which I am searching:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "deos": {
        "fields": {
          "text": {
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  }
}

`
The use case is as follows. I have a Users Collection. Each user has multiple Leos and Deos. I have set up sub-documents for Leos and Deos as follows:


leos Array (3)
deos Array (4)

firstname
Arjun 
lastname
Kochhar
.
.
.

Basically I need to match a text field in the leos sub-document to a text field in the deos sub-document. The match using Atlas Search works and I am able to obtain the matching text. However I cannot get the id of the deo document that matches a leo document or vice versa. This is basic and quite key. I need to know which document in leos matches which one in deos. There is no workaround and hack. It should be straightforward if you are recommending not to use separate tables but sub-documents to get the id of the items that match. Using filter in an aggregation pipeline is a hack, and quite a dirty way to do it. But if i were to use that approach how do I do it? Isnt there a more straightforward way?

To Reiterate:
Now I have a USERS collection and this contains sub collections of related user data. I use the Atlas Search to do a text search on field in a nested or sub -document in the USER collection and using PROJECT I am able to get the fields for the USER where there was a match in the sub-document. However I cannot for the life of me get the _id of the sub document where the actual match took place!

So then do I have to take out my Sub-Documents and create separate collections? This goes against the whole principle of Mongo Db data modelling that is detailed in your videos.

If I do take them out, and make separate collections for Leos and Deos, how do I relate these to my Users. Is this the right way to go from a data modelling perspective.

Currently I am building the system so I have a few Users, each with a few Leos and Deos. But when the system is live we hope to have millions of Users each with hundreds of Leos and Deos.

How do I approach this? Am stuck. Can you also point me to some videos /resources/sessions on data modelling in Mongo Db. I am not sure if using sub-documents is the way to go, when I need to search and match sub-documents with each other. This is a very basic requirement. If I was not using sub-documents, the way it is done in relational databases is that searches are done across tables. Here searches are done across sub-documents, only I cannot get the _id, then that defeats the whole purpose.

Google says the following about relating data in MongoDb:
In MongoDB, relationships between data can be managed using embedded documents and references : Embedded Documents: This approach stores related data within a single document, ideal for data that is frequently accessed together. It simplifies data
retrieval and ensures data locality.

Are embedded documents different from simply creating sub-documents as an array of objects nested in the parent document?
How does one do Atlas Search on fields in embedded/sub documents and return the id where there is a match rather than just the matching text?

This is then a question about how to model data in mongo db and how to do searches for sub-documents. Please address how data should be modelled in Mongo db given that search is almost always required.

Please suggest how to proceed.
Thanks.

Arjun_kochhar_leodeo · 2024-08-29T07:32:49.465Z

This is what Chatgpt has to say about this:

MongoDB Atlas Search provides powerful full-text search capabilities on documents within a MongoDB collection, leveraging the capabilities of Apache Lucene. However, when it comes to retrieving the _id of a sub-document that matches a search query, there are a few considerations and limitations to be aware of:

Querying Sub-Documents and Retrieving `_id`

Indexing Sub-Documents:
- To perform efficient searches on sub-documents, you need to create a suitable index in MongoDB Atlas Search. This often involves defining fields within sub-documents in your search index configuration.
Querying Sub-Documents:
- When querying sub-documents in Atlas Search, you generally query the entire document and include the criteria for matching sub-documents. For instance, you can use nested fields and various query operators to specify criteria.
Retrieving _id of Matching Sub-Documents:
- MongoDB Atlas Search does not directly support querying to return the _id of sub-documents that match a query. By default, Atlas Search returns documents that match the query criteria, but it does not directly expose the _id of sub-documents within those documents.

Workarounds

If you need to retrieve the _id of sub-documents or match specific sub-documents, you can consider the following approaches:

Post-Processing:

Perform the search query and retrieve the parent documents.
After retrieving the parent documents, you can iterate through the sub-documents in your application code to find and extract the matching sub-documents’ _id values.

Example:

// Example code to demonstrate the post-processing approach
const searchResults = await collection.aggregate([
  { $search: { /* Your search criteria */ } },
  { $project: { subDocuments: 1 } } // Project the sub-documents
]);

// Post-processing to extract _id from sub-documents
const matchingSubDocIds = searchResults.flatMap(parentDoc => 
  parentDoc.subDocuments
    .filter(subDoc => /* your match condition */)
    .map(matchingSubDoc => matchingSubDoc._id)
);

Aggregation Pipeline:
- Use MongoDB’s aggregation framework to process the results of an Atlas Search query. You can combine the $search stage with other aggregation stages to filter and transform documents as needed.
Example aggregation pipeline:
```
const results = await collection.aggregate([
  { $search: { /* Your search criteria */ } },
  { $project: { subDocuments: 1 } },
  { $unwind: "$subDocuments" },
  { $match: { /* Match criteria for sub-documents */ } },
  { $project: { "subDocuments._id": 1 } }
]).toArray();
```
In this pipeline:
- $unwind is used to deconstruct the sub-documents array.
- $match filters the documents based on your criteria.
- $project selects the _id of the matching sub-documents.
Custom Indexing:
- Consider custom indexing strategies if your search requirements involve frequent lookups of sub-documents. You can index fields of sub-documents and use aggregation to efficiently retrieve necessary data.

Summary

While MongoDB Atlas Search itself does not provide a direct way to get the _id of matching sub-documents, you can achieve this through a combination of search queries, aggregation pipelines, and application-level post-processing. The aggregation framework in MongoDB is particularly useful for this purpose, allowing you to manipulate and extract specific parts of your documents after performing the search.

Arjun_kochhar_leodeo · 2024-08-29T07:34:17.837Z

This is convoluted. If MongoDb is suggesting that modelling be done so that documents that would in the relational world be different tables, are not modelled as tables but are modelled as sub-documents, then it should be possible to get the _id of a sub-document in a search? Getting ids of matching documents in other tables is very basic, so it should be the case for sub-documents, if this is the recommended way to model other tables in mongo db.

Something to consider.

Also please advise how i should proceed from here.

The matching of leos to deos is very crucial for me. I need the ids clearly for every match.