Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Add US Postal Abbreviations to Your Atlas Search in 5 Minutes

Ksenia Samokhvalova, Amelia Short9 min read ā€¢ Published Sep 29, 2022 ā€¢ Updated Sep 29, 2022
Atlas
Facebook Icontwitter iconlinkedin icon
Drawing of code with magnifying glass
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
There are cases when it helps to have synonyms set up to work with your Atlas Search index. For example, if the search in your application needs to work with addresses, it might help to set up a list of common synonyms for postal abbreviations, so one could type in ā€œblvdā€ instead of ā€œboulevardā€ and still find all places with ā€œboulevardā€ in the address.
This tutorial will show you how to set up your Atlas Search index to recognize US postal abbreviations.

Prerequisites

To be successful with this tutorial, you will need:
  • Python, to use a script that scrapesĀ a list of street suffix abbreviationsĀ helpfully compiled by the United States Postal Service (USPS). This tutorial was written using Python 3.10.15, but you could try it on earlier versions of 3, if youā€™d like.
  • A MongoDB Atlas cluster. Follow theĀ Get Started with AtlasĀ guide to create your account and a MongoDB cluster. For this tutorial, you can use yourĀ free-forever MongoDB Atlas cluster!Ā Keep a note of your database username, password, andĀ connection stringĀ as you will need those later.
  • Rosetta, if youā€™re on a MacOS with an M1 chip. This will allow you to run MongoDB tools likeĀ mongoimportĀ andĀ mongosh.Ā 
  • mongosh for running commands in the MongoDB shell. If you donā€™t already have it,Ā install mongosh.
  • A copy ofĀ mongoimport. If you have MongoDB installed on your workstation, then you may already haveĀ mongoimportĀ installed. If not, follow the instructions on the MongoDB website toĀ install mongoimport.Ā 
  • We're going to be using a sample_restaurants dataset in this tutorial since it contains address data. For instructions on how to load sample data, see theĀ documentation. Also, you canĀ see all available sample datasets.
The examples shown here were all written on a MacOS but should run on any unix-type system. If you're running on Windows, we recommend running the example commands inside theĀ Windows Subsystem for Linux.
To learn about synonyms in Atlas Search, we suggest you start by checking out ourĀ documentation. SynonymsĀ allow you to index and search your collection for words that have the same or nearly the same meaning, or, in the case of our tutorial, you can search using different ways to write out an address and still get the results you expect. To set up and use synonyms in Atlas Search, you will need to:
  1. Create a collection in the same database as the collection youā€™re indexingĀ  containing the synonyms. Note that every document in the synonyms collection must haveĀ 
    a specific format
    .
  2. Reference your synonyms collection in your search index definitionĀ via a synonym mapping.
  3. Reference your synonym mapping in the $search command with theĀ $text operator.Ā 
We will walk you through these steps in the tutorial, but first, letā€™s start with creating the JSON documents that will form our synonyms collection.

Scrape the USPS postal abbreviations page

We will useĀ the list of official street suffix abbreviationsĀ andĀ a list of secondary unit designators from the USPS website to create a JSON document for each set of the synonyms.
All documents in the synonyms collection must have aĀ specific formatthat specifies the type of synonymsā€”equivalent or explicit. Explicit synonyms have a one-way mapping. For example, if ā€œboatā€ is explicitly mapped to ā€œsail,ā€ weā€™d be saying that if someone searches ā€œboat,ā€ we want to return all documents that include ā€œsailā€ and ā€œboat.ā€ However, if we search the word ā€œsail,ā€ we would not get any documents that have the word ā€œboat.ā€ In the case of postal abbreviations, however, one can use all abbreviations interchangeably, so we will use the ā€œequivalentā€ type of synonym in the mappingType field.
Here is a sample document in the synonyms collection for all the possible abbreviations of ā€œavenueā€:
1ā€œAvenueā€:Ā 
2
3{
4
5"mappingType":"equivalent",
6
7"synonyms":["AVENUE","AV","AVEN","AVENU","AVN","AVNUE","AVE"]
8
9}
We wrote the web scraping code for you in Python, and you can run it with the following commands to create a document for each synonym group:
1git clone https://github.com/mongodb-developer/Postal-Abbreviations-Synonyms-Atlas-Search-Tutorial/Ā 
2
3cd Postal-Abbreviations-Synonyms-Atlas-Search-Tutorial
4
5python3 main.py
To see details of the Python code, read the rest of the section.
In order to scrape the USPS postal website, we will need to import the following packages/libraries and install them using PIP:Ā requests,Ā BeautifulSoup, andĀ pandas. Weā€™ll also want to importĀ jsonĀ andĀ reĀ for formatting our data when weā€™re ready:
1import json
2
3import requests
4
5from bs4 import BeautifulSoup
6
7import pandas as pd
8
9import re
Letā€™s start with the Street Suffix Abbreviations page. We want to create objects that represent both the URL and the page itself:
1# Create a URL object
2
3streetsUrl = 'https://pe.usps.com/text/pub28/28apc_002.htm'
4
5# Create object page
6
7headers = {
8
9Ā Ā Ā Ā "User-Agent": 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Mobile Safari/537.36'}
10
11streetsPage = requests.get(streetsUrl, headers=headers)
Next, we want to get the information on the page. Weā€™ll start by parsing the HTML, and then get the table by its id:
1# Obtain page's information
2
3streetsSoup = BeautifulSoup(streetsPage.text, 'html.parser')
1# Get the table by its id
2
3streetsTable = streetsSoup.find('table', {'id': 'ep533076'})
Now that we have the table, weā€™re going to want to transform it into aĀ dataframe, and then format it in a way thatā€™s useful for us:
1# Transform the table into a list of dataframes
2
3streetsDf = pd.read_html(str(streetsTable))
One thing to take note of is that in the table provided on USPSā€™s website, one primary name is usually mapped to multiple commonly used names. Table with USPS names and abbreviations This means we need to dynamically group together commonly used names by their corresponding primary name and compile that into a list:
1# Group together all "Commonly Used Street Suffix or Abbreviation" entries
2
3streetsGroup = streetsDf[0].groupby(0)[1].apply(list)
Once our names are all grouped together, we can loop through them and export them as individual JSON files.
1for x in range(streetsGroup.size):
2
3
4Ā Ā Ā Ā dictionary = {
5
6Ā Ā Ā Ā Ā Ā Ā Ā "mappingType": "equivalent",
7
8Ā Ā Ā Ā Ā Ā Ā Ā "synonyms": streetsGroup[x]
9
10Ā Ā Ā Ā }
11
12
13Ā Ā Ā Ā # export the JSON into a file
14
15Ā Ā Ā Ā with open(streetsGroup.index.values[x] + ".json", "w") as outfile:
16
17Ā Ā Ā Ā Ā Ā Ā Ā json.dump(dictionary, outfile)
Now, letā€™s do the same thing for the Secondary Unit Designators page:
Just as before, weā€™ll start with getting the page and transforming it to a dataframe:
1# Create a URL object
2
3unitsUrl = 'https://pe.usps.com/text/pub28/28apc_003.htm'
4
5
6unitsPage = requests.get(unitsUrl, headers=headers)
7
8
9# Obtain page's information
10
11unitsSoup = BeautifulSoup(unitsPage.text, 'html.parser')
12
13
14# Get the table by its id
15
16unitsTable = unitsSoup.find('table', {'id': 'ep538257'})
17
18
19# Transform the table into a list of dataframes
20
21unitsDf = pd.read_html(str(unitsTable))
If we look at the table more closely, we can see that one of the values is blank. While it makes sense that the USPS would include this in the table, itā€™s not something that we want in our synonyms list. Table with USPS descriptions and abbreviations To take care of that, weā€™ll simply remove all rows that have blank values:
1unitsDf[0] = unitsDf[0].dropna()
Next, weā€™ll take our new dataframe and turn it into a list:
1# Create a 2D list that we will use for our synonyms
2
3unitsList = unitsDf[0][[0, 2]].values.tolist()
You may have noticed that some of the values in the table have asterisks in them. Letā€™s quickly get rid of them so they wonā€™t be included in our synonym mappings:
1# Remove all non-alphanumeric characters
2
3unitsList = [[re.sub("[^ \w]"," ",x).strip().lower() for x in y] for y in unitsList]
Now we can now loop through them and export them as individual JSON files just as we did before. The one thing to note is that we want to restrict the range on which weā€™re iterating to include only the relevant data we want:
1# Restrict the range to only retrieve the results we want
2
3for x in range(1, len(unitsList) - 1):
4
5
6Ā Ā Ā Ā dictionary = {
7
8Ā Ā Ā Ā Ā Ā Ā Ā "mappingType": "equivalent",
9
10Ā Ā Ā Ā Ā Ā Ā Ā "synonyms": unitsList[x]
11
12Ā Ā Ā Ā }
13
14
15Ā Ā Ā Ā # export the JSON into a file
16
17Ā Ā Ā Ā with open(unitsList[x][0] + ".json", "w") as outfile:
18
19Ā Ā Ā Ā Ā Ā Ā Ā json.dump(dictionary, outfile)

Create a synonyms collection with JSON schema validation

Now that we created the JSON documents for abbreviations, letā€™s load them all into a collection in the sample_restaurants database. If you havenā€™t already created a MongoDB cluster, now is a good time to do that and load the sample data in.
The first step is to connect to your Atlas cluster. We will use mongosh to do it. If you donā€™t have mongosh installed, follow theĀ instructions.
To connect to your Atlas cluster, you will need aĀ connection string. Choose the ā€œConnect with the MongoDB Shellā€ option and follow the instructions. Note that you will need to connect with aĀ database userĀ that has permissions to modify the database, since we would be creating a collection in the sample_restaurant database. The command you need to enter in the terminal will look something like:
1mongosh "mongodb+srv://cluster0.XXXXX.mongodb.net/sample_restaurant" --apiVersion 1 --username <USERNAME>
When prompted for the password, enter the database userā€™s password.
We created our synonym JSON documents in the right format already, but letā€™s make sure that if we decide to add more documents to this collection, they will also have the correct format. To do that, we will create a synonyms collection with a validator that usesĀ $jsonSchema. The commands below will create a collection with the name ā€œpostal_synonymsā€ in the sample_restaurants database and ensure that only documents with correct format are inserted into the collection.
1use('sample_restaurants')
2
3db.createCollection("postal_synonyms", { validator: { $jsonSchema: { "bsonType": "object", "required": ["mappingType", "synonyms"], "properties": { "mappingType": { "type": "string", "enum": ["equivalent", "explicit"], "description": "must be a either equivalent or explicit" }, "synonyms": { "bsonType": "array", "items": { "type": "string" }, "description": "must be an Array with each item a string and is required" }, "input": { "type": "array", "items": { "type": "string" }, "description": "must be an Array and is each item is a string" } }, "anyOf": [{ "not": { "properties": { "mappingType": { "enum": ["explicit"] } }, "required": ["mappingType"] } }, { "required": ["input"] }] } } })

Import the JSON files into the synonyms collection

We will use mongoimport to import all the JSON files we created.
You will need aĀ connection stringĀ for your Atlas cluster to use in the mongoimport command. If you donā€™t already have mongoimport installed, useĀ theĀ instructionsĀ in the MongoDB documentation.
In the terminal, navigate to the folder where all the JSON files for postal abbreviation synonyms were created.
1cat *.json | mongoimport --uri 'mongodb+srv://<USERNAME>:<PASSWORD>@cluster0.pwh9dzy.mongodb.net/sample_restaurants?retryWrites=true&w=majority' --collection='postal_synonyms'
If you liked mongoimport, check out thisĀ very helpful mongoimport guide.
Take a look at the synonyms collections you just created in Atlas. You should see around 229 documents there.
Document with postal synonyms in Atlas UI

Create a search index with synonyms mapping in JSON Editor

Now that we created the synonyms collection in our sample_restaurants database, letā€™s put it to use.
Letā€™s start by creating a search index. Navigate to the Search tab in your Atlas cluster and click the ā€œCREATE INDEXā€ button.
Since the Visual Index builder doesnā€™t support synonym mappings yet, we will choose JSON Editor and click Next:
The Create Search Index UI in Atlas In the JSON Editor, pick restaurants collection in the sample_restaurants database and enter the following into the index definition. Here, the source collection name refers to the name of the collection with all the postal abbreviation synonyms, which we named ā€œpostal_synonyms.ā€
1{
2
3Ā Ā "mappings": {
4
5Ā Ā Ā Ā "dynamic": true
6
7Ā Ā },
8
9Ā Ā "synonyms": [
10
11Ā Ā Ā Ā {
12
13Ā Ā Ā Ā Ā Ā "analyzer": "lucene.standard",
14
15Ā Ā Ā Ā Ā Ā "name": "synonym_mapping",
16
17Ā Ā Ā Ā Ā Ā "source": {
18
19Ā Ā Ā Ā Ā Ā Ā Ā "collection": "postal_synonyms"
20
21Ā Ā Ā Ā Ā Ā }
22
23Ā Ā Ā Ā }
24
25Ā Ā Ā Ā ]
26
27}
The Create Search Index JSON Editor UI in Atlas
We are indexing the restaurants collection and creating a synonym mapping with the name ā€œsynonym_mappingā€ that references the synonyms collection ā€œpostal_synonyms.ā€
Click on Next and then on Create Search Index, and wait for the search index to build.
Once the index is active, weā€™re ready to test it out.

Test that synonyms are working (aggregation pipeline in Atlas or Compass)

Now that we have an active search index, weā€™re ready to test that our synonyms are working. Letā€™s head to the Aggregation pipeline in the Collections tab to test different calls to $search. You can alsoĀ useĀ Compass, the MongoDB GUI, if you prefer.
Choose $search from the list of pipeline stages. The UI gives us a helpful placeholder for the $search commandā€™s arguments.
Letā€™s look for all restaurants that are located on a boulevard. We will search in the ā€œaddress.streetā€ field, so the arguments to the $search stage will look like this:
1{
2
3Ā Ā index: 'default',
4
5Ā Ā text: {
6
7Ā Ā Ā Ā query: 'boulevard',
8
9Ā Ā Ā Ā path: 'address.street'
10
11Ā Ā }
12
13}
Letā€™s add a $count stage after the $search stage to see how many restaurants with an address that contains ā€œboulevardā€ we found: Screenshot showing results of an aggregation pipeline with two stages and "boulevard" query term. As expected, we found a lot of restaurants with the word ā€œboulevardā€ in the address. But what if we donā€™t want to have users type ā€œboulevardā€ in the search bar? What would happen if we put in ā€œblvd,ā€ for example?
1{
2
3Ā Ā index: 'default',
4
5Ā Ā text: {
6
7Ā Ā Ā Ā query: blvd,
8
9Ā Ā Ā Ā path: 'address.street'
10
11Ā Ā }
12
13}
Screenshot showing results of an aggregation pipeline with two stages and "blvd" query term. Looks like it found us restaurants with addresses that have ā€œblvdā€ in them. What about the addresses with ā€œboulevard,ā€ though? Those did not get picked up by the search.Ā 
And what if we werenā€™t sure how to spell ā€œboulevardā€ and just searched for ā€œboulā€?Ā USPSā€™s websiteĀ tells us itā€™s an acceptable abbreviation for boulevard, but our $search finds nothing. Screenshot showing results of an aggregation pipeline with two stages and "boul" query term. This is where our synonyms come in! We need to add a synonyms option to the text operator in the $search command and reference the synonym mappingā€™s name:
1{
2
3Ā Ā index: 'default',
4
5Ā Ā text: {
6
7Ā Ā Ā Ā query: 'blvd',
8
9Ā Ā Ā Ā path: 'address.street',
10
11Ā Ā Ā Ā synonyms:'synonym_mapping'
12
13Ā Ā }
14
15}
Screenshot showing results of an aggregation pipeline with two stages, "boul" query term, and synonyms enabled. And there you have it! We found all the restaurants on boulevards, regardless of which way the address was abbreviated, all thanks to our synonyms.

Conclusion

Synonyms is just one of many featuresĀ Atlas SearchĀ offers to give you all the necessary search functionality in your application. All of these features are available right now onĀ MongoDB Atlas. We just showed you how to add support for common postal abbreviations to your Atlas Search indexā€”what can you do with Atlas Search next? Try it now on your free-foreverĀ MongoDB AtlasĀ cluster and head over toĀ community forumsĀ if you have any questions!

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Utilizing Collection Globbing and Provenance in Data Federation


Jun 28, 2023 | 5 min read
Article

Auto Pausing Inactive Clusters


Sep 09, 2024 | 10 min read
Tutorial

How to Choose the Best Embedding Model for Your LLM Application


Nov 07, 2024 | 16 min read
Tutorial

Securely Hosting a Lambda Application With a Microservice Architecture and MongoDB Atlas


Sep 19, 2024 | 4 min read
Table of Contents