2 / 2
Jan 4

I have a pool collection and am trying to find a match for the Name field of the pool collection. I also want my query to work with case-insensitive and diacritic case-insensitive data.

I have a pool with Name=‘Maøé’ in my pool collection.
I have used the below queries:

The below query did not return any matching result when I am using regex.
db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: { $regex: "maoe", $options: "i" }}).collation({locale:"en", strength: 1})

The below query returned the expected pool.
db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: { $regex: "Maøé", $options: "i" }}).collation({locale:"en", strength: 1})

The below query returned the expected pool when and is working with the diacritic-insensitive search but here I have to use the exact match i.e. full-text search.

db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: "Maoe"}).collation({locale:"en", strength: 1})

The output of the query is given below

{ _id: Binary.createFromBase64('FI2HAMcF/E6etfRWWcZ7Fg==', 3), AggregateVersion: 2, AuditableEvents: [], LastModified: 2024-12-19T13:05:05.266Z, DatabaseVersion: 638302904038095700, TenantId: Binary.createFromBase64('fjTbsQVA40CM5UsUlkTkDQ==', 3), CreatedAt: 2024-12-19T13:04:47.661Z, EntityReferences: [ { Name: 'OwnerId', Key: '570f3770-219c-61ca-5e64-a320fbc30def' }, { Name: 'SharedOwnerships.InternalEmployeeId', Key: '56bf3fcb-e7f4-8c92-e1e1-370c26f5bcc7' }, { Name: 'SharedOwnerships.InternalEmployeeId', Key: '0d9c0a0b-f9ba-b92b-411f-a33312530aff' } ], Name: 'Maøé', OwnerId: Binary.createFromBase64('cDcPV5whymFeZKMg+8MN7w==', 3), SharedOwnerships: [ { InternalEmployeeId: Binary.createFromBase64('yz+/VvTnkozh4TcMJvW8xw==', 3), DateAdded: 2024-12-19T13:05:05.249Z }, { InternalEmployeeId: Binary.createFromBase64('CwqcDbr5K7lBH6MzElMK/w==', 3), DateAdded: 2024-12-19T13:05:05.254Z } ] }

It appears that collation is not functioning as expected when using regex in my query. Specifically:

  • Diacritic-Insensitive and Case-Insensitive Search: These features do not seem to work properly when a regex-based query is used, even though collation is applied.

  • Why does collation fail to enable diacritic-insensitive and case-insensitive search in combination with regex?
    Could this be a limitation of MongoDB’s current implementation of regex and collation?

  • Is there an alternative approach to implement diacritic-insensitive and case-insensitive search effectively in MongoDB, especially for scenarios where regex is required?

I’m not sure about all your issues, but I do know that ø is not o + diacritic, but a separate letter of the Scandinavian alphabets. E.g., see Ø - Wikipedia