EventJoin us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases. Learn more >>

Relevant as-you-type suggestions

Add fast and relevant as-you-type suggestions to your application, incorporating user context and domain-specific ranking factors.
Start Free
An illustration of a developer typing on a laptop
Solution overview

The quicker a user can navigate to desired, relevant content, the sooner they can leverage that knowledge, buy that product, help that customer, and make those critical decisions. Picture getting to “The Matrix” by only typing “matr,” or finding replacement air filters for the ones you bought a few months ago by typing only “fi” in the search box, and your previous order is the top suggestion. These are examples of as-you-type suggestions.

Vector Search and full-text search are great at matching content semantically and fuzzily, when there is a complete query or very close word matches. But as-you-type functionality can return relevant results with even fewer characters and where there's even more distance between the inputted text and the target keyword. Only a lexical-based solution that facilitates partial matching like as-you-type suggestions can provide this level of relevant, context-sensitive results.

As-you-type functionality — also known as autocomplete, autosuggest, typeahead, search-ahead, and predictive search — often refers to low-level character matching, as opposed to a purpose-built, comprehensive solution. As demonstrated here, we use “as-you-type suggestions” to refer to a complete solution encompassing tunable relevancy, filtering, and highlighting.

With this solution, you’ll be able to add fast and relevant as-you-type suggestions to your application, incorporating user context and domain-specific ranking factors.

Reference architectures

This as-you-type suggestion solution is architecturally straightforward — as a user types, requests are sent to Atlas Search, which returns relevant results. The heart of the architecture is a specialized entities collection and the corresponding queries.

As-you-type solution architecture
As-you-type solution architecture
Data model approach

Each suggestion presented to the user represents a unique entity of your domain. A requirement of this solution is that entities must be modeled as individual documents in a specialized collection tuned for as-you-type suggestibility.

It's often the case that your main collection represents one type of entity as documents, and other domain entities as metadata fields or embedded documents. For example, let’s take the sample movies data available within Atlas: as a user types, movie titles certainly should be suggested. But what about cast member names? Can I find movies starring Keanu Reeves by typing only "kea"? What about documentaries by only typing "doc"?

It’s a simple model with the following basic schema:

  • _ id: unique id for this collection in the form < type >-< natural id >
  • type: entity/object type, e.g. movie, brand, person product, and category
  • name: the name or title of the entity, which would generally be unique per type

It’s important that entity documents have stable, unique identifiers, as the entities will be regularly refreshed from the main collection. Assigning a type to each entity allows for filtering (only suggest cast members, say, in an actor-specific lookup), grouping (organize the suggestions by type), or boosting by type (movie titles could have a higher weighting than cast member names).

Modeling entities directly as individual documents allows each to carry optional metadata fields to assist in ranking, displaying, filtering, or grouping them.

At the heart of this solution, the straightforward document model feeds the name field through a sophisticated index configuration, which slices and dices the values in a multitude of ways suitable for querying in several ways. The power of this solution comes from the synergy of multiple indexing and querying strategies.

Building the Solution

First, identify the entities in your data that are to be suggestible. In the movies scenario, these would include movie titles, cast member names, and perhaps genres and director names too.

The basis of this as-you-type suggestion system can be achieved in a few steps:

  1. Create an “entities” collection and populate it using the schema modeled above. As often as warranted, refresh the “entities” collection.
  2. Create an Atlas Search “entities_index” using an index configuration as described below.
  3. Craft a robust set of query clauses, along with any pertinent boosting factors, within a $search-using aggregation pipeline.
Importing entities

While there are multiple ways to populate the “entities” collection, one straightforward way to populate it is with a short and sweet aggregation pipeline run on the main collection to bring in the unique titles across all movies:

The $project converts each unique movie title into the necessary “entities” schema. Because this collection types each document, the type is encoded as a prefix of the generated _id and appended with the actual movie title creating a reproducible identifier for each unique title. Including type in the entity identifiers allows different types of entities with the same name to be independent from one another (there could be a movie named “Adventure” as well as the “Adventure” genre).

And finally, the handy $merge stage adds all new titles and leaves the existing ones untouched.

The resulting title-typed document for “The Matrix” comes out simply as:

Each entity type potentially needs its own technique for merging into the “entities” collection, as in the case of the "genre" and "cast" entities, which need to be unwound from their nested arrays using $unwind.

This cast-specific entities import brings in “Keanu Reeves” as:

Indexing entities

The name field is indexed in a multitude of ways, which will facilitate partial matching and ranking at query time.

Multiple indexing strategies
Multiple indexing strategies

Atlas Search index configuration enables a single document field to be indexed in a multitude ("multi") of ways (the feature is called “multi”-analyzers).

The type field is indexed as both a token field, for equals or in filtering, and a stringFacet field to provide a means to get counts across the results of each entity type.

Any other fields added beyond id, type, and name are handled by the index definition, either through dynamic mapping or the static definitions you provide. In this example, weight is custom and handled dynamically as a numeric type.

Searching for suggestions

The resulting specialized search index provides the foundation for as-you-type queries. The name field is indexed in a number of ways and matched against users typing with various tunable query operators. The idea is to throw the query operators against these differently analyzed mappings and see what sticks — the more ways they stick, the higher the suggestion is ranked. Each of the query clauses can be independently boosted and summed giving a relevancy score for the matching entity. These scores could be further boosted by other factors such as an optional entity weight field.

Example query and relevancy scoring computation
Example query and relevancy scoring computation

Generally, the behavior of a user selecting a suggestion is to then perform a targeted traditional search for the selected item, which would in turn surface all matching items.

Key considerations

Model suggestible entities as documents with a specialized index configuration: This could be done as described above in a separate collection containing entities from any source. Or, if your main collection models all suggestible entities as top-level documents already, an index can be created or an existing one augmented to use the index configuration techniques described here.

Craft clever queries: Leverage the index structure, generating rich and nuanced queries to match entities and rank suggestions as desired.

Technologies and products used
MongoDB developer data platform:
Author
  • Erik Hatcher, MongoDB
Related resources
general_content_presentations

The Atlas Search ‘cene

Get started with Atlas Search with an introductory video series.

general_content_tutorial

Partial matching techniques

Consider the various operators available for matching loosely.

general_content_tutorial

Atlas Search relevancy explained

Learn more about how relevancy scores are computed.

general_content_developer

GitHub: as-you-type suggestions

Apply this solution yourself using the examples here.

Get started with Atlas today

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.