Yeah so I definitely think we can go with scenario 1 for now as there are a limited number of filters we would be looking to apply. Longer term I definitely think option 3 would be nice!
In terms of the detail basically because of the limit of having a vector index on a non list field we have stored all our document embeddings for the content in question in a separate collection. This is because we are chunking large pieces of content for better search performance. However we also have a series of metadata that exists on that content object model that we would use to filter the results. So we are of course duplicating those fields across each “chunk” in the other collection so we can do the filtering. But as you say its a small amount of duplciation and hopefully in future we can find a nicer solution.
I imagine this sort of pattern is quite common? For anyone searching across long form content. Really appreciate the advice though, scenario 1 was sort of my thinking so good to validate that.