This playground example shows (or correct me if I’m doing something wrong) that unigrams are not emitted by shingle filter
Index:
{
"analyzer": "my-analyzer",
"searchAnalyzer": "lucene.keyword",
"mappings": {
"dynamic": true
},
"analyzers": [
{
"charFilters": [],
"name": "my-analyzer",
"tokenFilters": [ { "type": "trim" }, { "type": "lowercase"},
{
"minShingleSize": 2,
"maxShingleSize": 5,
"type": "shingle"
}
],
"tokenizer": { "maxTokenLength": 100, "type": "whitespace" }
}
]
}
Data:
[
{
"value": "marry"
},
{
"value": "marry had a little lamb"
},
]
Searches:
[
{
$search: {
index: "default",
text: {
query: "marry had",
path: {
wildcard: "*"
}
}
}
}
]
→ one document found
[
{
$search: {
index: "default",
text: {
query: "had a little lamb",
path: {
wildcard: "*"
}
}
}
}
]
→ one document found
[
{
$search: {
index: "default",
text: {
query: "marry",
path: {
wildcard: "*"
}
}
}
}
]
→ no documents found
so it seems that unigrams are not in the index