To achieve what you’re looking for, you can use a combination of MongoDB aggregation stages and text processing methods. However, MongoDB’s vector search doesn’t natively provide the specific keywords that match from the fields. To extract the keywords or terms that contributed to the relevance of the result, you would need to analyze the content of the fields returned in the search result.

Approach

  1. Perform Vector Search: You retrieve the relevant documents based on their vector proximity to the search query.
  2. Extract Relevant Terms: Post-process the results to extract terms (like “human health” and “human heart”) that are relevant to the query. You can do this by matching the query terms against the text in the relevant fields.

Here’s how you might implement this in your MongoDB aggregation pipeline:

javascript

Copy code

[
  {
    $vectorSearch: {
      queryVector: embedding,
      path: 'plot_embedding',
      numCandidates: 10000,
      limit: 10,
      index: 'vector_index',
    },
  },
  {
    $group: {
      _id: null,
      docs: { $push: '$ROOT' },
    },
  },
  {
    $unwind: {
      path: '$docs',
      includeArrayIndex: 'rank',
    },
  },
  {
    $addFields: {
      vs_score: {
        $round: [{
          $divide: [1.0, { $add: ['$rank', vector_penalty, 1] }]
        }, 3]
      },
      matched_keywords: {
        $filter: {
          input: { $split: ['$docs.summary', ' '] }, // Split summary into words
          as: 'word',
          cond: { $regexMatch: { input: '$word', regex: searchKeyword, options: 'i' } }
        }
      }
    }
  },
  {
    $project: {
      vs_score: 1,
      _id: '$docs._id',
      name: '$docs.name',
      summary: '$docs.summary',
      website_url: '$docs.website_url',
      matched_keywords: 1,
    },
  }
]

Explanation:

  1. $vectorSearch: This stage performs the vector search and retrieves the most relevant documents.
  2. $group and $unwind: These stages allow you to rank the results and work with each document individually.
  3. $addFields with $filter: This stage extracts words from the summary field that match the searchKeyword. You can adjust the field (e.g., name, technologies, clients) depending on where you expect the keywords to appear. The $regexMatch operator finds words related to your search query in a case-insensitive manner.
  4. $project: This stage projects the relevant fields, including the matched keywords, along with the vector search score.

Important Notes:

  • Regex Matching: You might need to refine the regex to capture phrases or specific keywords better.
  • Keyword Extraction: The current example only extracts keywords from the summary field. You can expand this to other fields like name, technologies, and clients by modifying the $filter logic.
  • Advanced Keyword Extraction: For more sophisticated keyword extraction, you may want to use natural language processing (NLP) techniques outside of MongoDB, such as using a separate service or application logic. This can help identify semantically similar terms or phrases.

This pipeline gives you a starting point to extract keywords that the vector search may have used to determine relevance. Further tuning may be needed based on your data and search requirements.