Combining Your Database With Azure Blob Storage Using Data Federation
Tim Kelly7 min read • Published Aug 29, 2024 • Updated Oct 08, 2024
FULL APPLICATION
Rate this tutorial
For as long as you have been reviewing restaurants, you've been storing your data in MongoDB. The plethora of data you've gathered is so substantial, you decide to team up with your friends to host this data online, so other restaurant goers can decide where to eat, informed by your detailed insights. But your friend has been storing their data in Azure Blob storage. They use JSON now, but they have reviews upon reviews stored as
.csv
files. How can we get all this data pooled together without the often arduous process of migrating databases or transforming data? With MongoDB's Data Federation, you can combine all your data into one unified view, allowing you to easily search for the best French diner in your borough.This tutorial will walk you through the steps of combining your MongoDB database with your Azure Blob storage, utilizing MongoDB's Data Federation.
Before you begin, you'll need a few prerequisites to follow along with this tutorial, including:
- A Microsoft Azure account with a storage account and container setup. If you don't have this, follow the steps in the Microsoft documentation for the storage account and the container.
- Azure CLI, or you can install Azure PowerShell, but this tutorial uses Azure CLI. Sign in and configure your command line tool following the steps in the documentation for Azure CLI and Azure PowerShell.
- Node.js 18 or higher and npm: Make sure you have Node.js and npm (Node.js package manager) installed. Node.js is the runtime environment required to run your JavaScript code server-side. npm is used to manage the dependencies.
To have something to view when your data stores are connected, let's add some reviews to your blob. First, you'll add a review for a new restaurant you just reviewed in Manhattan. Create a file called example1.json, and copy in the following:
1 { 2 "address":{ 3 "building":"518", 4 "coord":[ 5 { 6 "$numberDouble":"-74.006220" 7 }, 8 { 9 "$numberDouble":"40.733740" 10 } 11 ], 12 "street":"Hudson Street", 13 "zipcode":"10014" 14 }, 15 "borough":"Manhattan", 16 "cuisine": [ 17 "French", 18 "Filipino" 19 ], 20 "grades":[ 21 { 22 "date":{ 23 "$date":{ 24 "$numberLong":"1705403605904" 25 } 26 }, 27 "grade":"A", 28 "score":{ 29 "$numberInt":"12" 30 } 31 } 32 ], 33 "name":"Justine's on Hudson", 34 "restaurant_id":"40356020" 35 }
Upload this file as a blob to your container:
1 az storage blob upload --account-name <YourStorageAccountName> --container-name <YourContainerName> --name <BlobName> --file <PathToFile>
Here,
BlobName
is the name you want to assign to your blob (just use the same name as the file), and PathToFile
is the path to the file you want to upload (example1.json).But you're not just restricted to JSON in your federated database. You're going to create another file, called example2.csv. Copy the following data into the file:
1 Restaurant ID,Name,Cuisine,Address,Borough,Latitude,Longitude,Grade Date,Grade,Score 2 40356030,Sardi's,Continental,"234 W 44th St, 10036",Manhattan,40.757800,-73.987500,1927-09-09,A,11
Load example2.csv to your blob using the same command as above.
You can list the blobs in your container to verify that your file was uploaded:
1 az storage blob list --account-name <YourStorageAccountName> --container-name <YourContainerName> --output table
The first steps will be getting your MongoDB cluster set up. For this tutorial, you're going to create a free M0 cluster. Once this is created, click "Load Sample Dataset." In the sample dataset, you'll see a database called
sample_restaurants
with a collection called restaurants
, containing thousands of restaurants with reviews. This is the collection you'll focus on.Now that you have your Azure Storage and MongoDB cluster setup, you are ready to deploy your federated database instance.
- Select "Data Federation" from the left-hand navigation menu.
- Click "Create New Federated Database" and, from the dropdown, select "Set up manually."
- Choose Azure as your cloud provider and give your federate database instance a name.
- To add your data source, click "Add Data Source" and select Azure Blob Storage as your data store.
- Next, you need to select an Azure Service Principal. You can use an existing one, but you'll create a new one for this tutorial. From the dropdown, select "Authorize an Azure Service Principal'' and click continue.
- To assign the relationships, follow the onscreen instructions. In this tutorial, you are going to use AzureCLI.
- First, you need to get the tenant ID by running the command:
1 az account show --query tenantId - Next, run the following command to create your new Service Principle for Atlas, and copy the "id" from the output:
1 az ad sp create --id 9efedfcc-2eca-4b27-a613-0cad1e114cb7
- You need to grant your access. Again, there will be prompts on the screen to guide you through the steps, but you can follow along here.
- First, run
az storage account list --query "[].id"
to get the Storage Account Resource ID. This will auto-populate the command on the page that you can copy and run in your terminal to set up the credentials delegation. - After this, set up your storage container access. Choose whether you want to grant "Read-only" or "Read and write" privileges. Select your storage container region, and enter your storage container name. This will auto-populate the command below on the page that you can copy and paste in.
- Now, you need to provide sample pathways so you can query your data. If you've been following along with this simple example, copy the pathway
https://<YourStorageAccount>.blob.core.windows.net/<YourContainerName>/sample.json
— of course, with your account and container names — into the box. - To link your Azure Blob Storage path components, from the dropdown, accept "any value (*)" and click “Next.”
- You should see your container name under your data sources now. All you need to do is drag your dataset and drop it into your virtual collection.
Let's add your MongoDB dataset to your federated database now.
- Click "Add Data Sources" and choose “Atlas Cluster.”
- Select the cluster that contains the sample dataset.
- This will bring up all the databases in the cluster. Click the dropdown arrow next to
sample_restaurants
to view the collections inside that database, chooserestaurants
, and click “Next.” - Now, you should see
sample_restaurants.restaurants
under data sources. You can drag your dataset over to your virtual collection, just like before.
Now that you have both datasets in your federated database instance, click “Create,” and it's time to view your data.
There are many ways to view the data in the federated database. For this tutorial, you’ll create a simple Node.js application. To do this, you'll need a connection string. Once your federated database instance is created, you'll be able to see it under "Data Federation" on the left-hand navigation menu. On your instance, click “Connect.” Select “Driver” as your connection method and copy your connection string. Now, open up an IDE of your choice.
For this tutorial, only a simple Node.js application is needed. If you want to learn more about developing with MongoDB and JavaScript, check out Developer Center, where you’ll find a whole variety of tutorials, or explore MongoDB with other languages.
Before you start, make sure you have Node.js installed in your environment.
- Set up a new Node.js project:
- Create a new directory for your project.
- Initialize a new Node.js project by running
npm init -y
in your terminal within that directory. - Install the MongoDB Node.js driver by running
npm install mongodb
.
- Create a JavaScript file:
- Create a file named searchApp.js in your project directory.
- Implement the application:
- Edit searchApp.js to include the following code, which connects to your MongoDB database and creates a client.
1 const { MongoClient } = require('mongodb'); 2 3 // Connection URL 4 const url = 'yourConnectionString'; 5 // Database Name 6 const dbName = 'yourDatabaseName'; 7 // Collection Name 8 const collectionName = 'yourCollectionName'; 9 10 // Create a new MongoClient 11 const client = new MongoClient(url); - Now, create a function called
searchDatabase
that takes an input string and field from the command line and searches for documents containing that string in the specified field.
1 // Function to search for a string in the database 2 async function searchDatabase(fieldName, searchString) { 3 try { 4 await client.connect(); 5 console.log('Connected successfully to server'); 6 const db = client.db(dbName); 7 const collection = db.collection(collectionName); 8 9 // Dynamic query based on field name 10 const query = { [fieldName]: { $regex: searchString, $options: "i" } }; 11 const foundDocuments = await collection.find(query).toArray(); 12 console.log('Found documents:', foundDocuments); 13 } finally { 14 await client.close(); 15 } 16 } - Lastly, create a main function to control the flow of the application.
1 // Main function to control the flow 2 async function main() { 3 // Input from command line arguments 4 const fieldName = process.argv[2]; 5 const searchString = process.argv[3]; 6 7 if (!fieldName || !searchString) { 8 console.log('Please provide both a field name and a search string as arguments.'); 9 return; 10 } 11 12 searchStringInDatabase(fieldName, searchString) 13 .catch(console.error); 14 } 15 16 main().catch(console.error); - Run your application with
node searchApp.js fieldName "searchString"
.- The script expects two command line arguments: the field name and the search string. It constructs a dynamic query object using these arguments, where the field name is determined by the first argument, and the search string is used to create a regex query.
In the terminal, you can type the query
node searchApp.js "Restaurant ID" "40356030"
to find your example2.csv
file as if it was stored in a MongoDB database. Or maybe node searchApp.js borough "Manhattan"
, to find all restaurants in your virtual database (across all your databases) in Manhattan. You're not just limited to simple queries. Most operators and aggregations are available on your federated database. There are some limitations and variations in the MongoDB Operators and Aggregation Pipeline Stages on your federated database that you can read about in our documentation.By following the steps outlined, you've learned how to set up Azure Blob storage, upload diverse data formats like JSON and CSV, and connect these with your MongoDB dataset using a federated database.
This tutorial highlights the potential of data federation in breaking down data silos, promoting data interoperability, and enhancing the overall data analysis experience. Whether you're a restaurant reviewer looking to share insights or a business seeking to unify disparate data sources, MongoDB's Data Federation along with Azure Blob storage provides a robust, scalable, and user-friendly platform to meet your data integration needs.
Are you ready to start building with Atlas on Azure? Get started for free today with MongoDB Atlas on Azure Marketplace. If you found this tutorial useful, make sure to check out some more of our articles in Developer Center, like MongoDB Provider for EF Core Tutorial. Or pop over to our Community Forums to see what other people in the community are building!
Top Comments in Forums
There are no comments on this article yet.