Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Using Atlas Data Federation to Control Access to Your Analytics Node

Taylor Pacelli9 min read • Published Jan 31, 2023 • Updated Aug 28, 2024
AtlasData Federation
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
MongoDB replica sets, analytics nodes, and read preferences are powerful tools that can help you ensure high availability, optimize performance, and control how your applications access and query data in a MongoDB database. This blog will cover how to use Atlas Data Federation to control access to your analytics node, customize read preferences, and set tag sets for a more seamless and secure data experience.

How do MongoDB replica sets work?

MongoDB deployed in replica sets is a strategy to achieve high availability. This strategy provides automatic failover and data redundancy for your applications. A replica set is a group of MongoDB servers that contain the same information, with one server designated as the primary node and the others as secondary nodes.
The primary node is the leader of a replica set and is the only node that can receive write operations, while the secondary nodes, on the other hand, continuously replicate the data from the primary node and can be used to serve read operations. If the primary node goes down, one of the secondaries can then be promoted to be the primary, allowing the replica set to continue to operate without downtime. This is often referred to as automatic failover. In this case, the new primary is chosen through an "election" process, which involves the nodes in the replica set voting for the new primary.
However, in some cases, you may not want your secondary node to become the primary node in your replica set. For example, imagine that you have a primary node and two secondary nodes in your database cluster. The primary node is responsible for handling all of the write operations and the secondary nodes are responsible for handling read operations. Now, suppose you have a heavy query that scans a large amount of data over a long period of time on one of the secondary nodes. This query will require the secondary node to do a lot of work because it needs to scan through a large amount of data to find the relevant results. If the primary were to fail while this query is running, the secondary node that is running the query could be promoted to primary. However, since the node is busy running the heavy query, it may struggle to handle the additional load of write operations now that it is the primary. As a result, the performance of the database may suffer, or the newly promoted node might fail entirely.
This is where Analytics nodes come in…

Using MongoDB’s analytics nodes to isolate workloads

If a database performs complex or long-running operations, such as ETL or reporting, you may want to isolate these queries from the rest of your operational workload by running them on analytics nodes which are completely dedicated to this kind of operation.
Analytics nodes are a type of secondary node in a MongoDB replica set that can be designated to handle special read-only workloads, and importantly, they cannot be promoted to primary. They can be scaled independently to handle complex analytical queries that involve large amounts of data. When you offload read-intensive workloads from the primary node in a MongoDB replica set, you are directing read operations to other nodes in the replica set, rather than to the primary node. This can help to reduce the load on the primary node and ensure it does not get overwhelmed.
To use analytics nodes, you must configure your cluster with additional nodes that are designated as “Analytic Nodes.” This is done in the cluster configuration setup flow. Then, in order to have your client application utilize the Analytic Nodes, you must utilize tag sets when connecting. Utilizing these tag sets enables you to direct all read operations to the analytics nodes.

What are read preferences and tag sets in MongoDB?

Read preferences in MongoDB allow you to control what node, within a standard cluster, you are connecting to and want to read from.
MongoDB supports several read preference types that you can use to specify which member of a replica set you want to read from. Here are the most commonly used read preference types:
  1. Primary: Read operations are sent to the primary node. This is the default read preference.
  2. PrimaryPreferred: Read operations are sent to the primary node if it is available. Otherwise, they are sent to a secondary node.
  3. Secondary: Read operations are sent to a secondary node.
  4. SecondaryPreferred: Read operations are sent to a secondary node, if one is available. Otherwise, they are sent to the primary node.
  5. Nearest: Read operations are sent to the member of the replica set with the lowest network latency, regardless of whether it is the primary or a secondary node.
*It's important to note that read preferences are only used for read operations, not write operations.
Tag sets allow you to control even more details about which node you read. MongoDB tag sets are a way to identify specific nodes in a replica set. You can think of them as labels. This allows the calling client application to specify which nodes in a replica set you want to use for read operations, based on the tags that have been applied to them.
MongoDB Atlas clusters are automatically configured with predefined tag sets for different member types depending on how you’ve configured your cluster. You can utilize these predefined replica set tags to direct queries from specific applications to your desired node types and regions. Here are some examples:
  1. Provider: Cloud provider on which the node is provisioned
    1. {"provider" : "AWS"}
    2. {"provider" : "GCP"}
    3. {"provider" : "AZURE"}
  2. Region: Cloud region in which the node resides
    1. {"region" : "US_EAST_2"}
  3. Node: Node type
    1. {"nodeType" : "ANALYTICS"}
    2. {"nodeType" : "READ_ONLY"}
    3. {"nodeType" : "ELECTABLE"}
  4. Workload Type: Tag to distribute your workload evenly among your non-analytics (electable or read-only) nodes.
    1. {"workloadType" : "OPERATIONAL"}

Customer challenge

Read preferences and tag sets can be helpful in controlling which node gets utilized for a specific query. However, they may not be sufficient on their own to protect against certain types of risks or mistakes. For example, if you are concerned about other users or developers accidentally accessing the primary node of the cluster, read preferences and tag sets may not provide enough protection, as someone with a database user can forget to set the read preference or choose not to use a tag set. In this case, you might want to use additional measures to ensure that certain users or applications only have access to specific nodes of your cluster.
MongoDB Atlas Data Federation can be used as a view on top of your data that is tailored to the specific needs of the user or application. You can create database users in Atlas that are only provisioned to connect to specific clusters or federated database instances. Then, when you provide the endpoints for the federated database instances and the necessary database users, you can be sure that the end user is only able to connect to the nodes you want them to have access to. This can help to "lock down" a user or application to a specific node, allowing them to better control which data is accessible to them and ensuring that your data is being accessed in a safe and secure way.

How does Atlas Data Federation fit in?

Atlas Data Federation is an on-demand query engine that allows you to query, transform, and move data across multiple data sources, as if it were all in the same place and format. With Atlas Data Federation, you can create virtual collections that refer to your underlying Atlas cluster collections and lock them to a specific read preference or tag set. You can then restrict database users to only be able to connect to the federated database instance, thereby giving partners within your business live access to your cluster data, while not having any risk that they connect to the primary. This allows you to isolate different workloads and reduce the risk of contention between them.
For example, you could create a separate endpoint for analytics queries that is locked down to read-only access and restrict queries to only run on analytics nodes, while continuing to use the cluster connection for your operational application queries. This would allow you to run analytics queries with access to real-time data without affecting the performance of the cluster.
To do this, you would create a virtual collection, choose the source of a specific cluster collection, and specify a tag set for the analytics node. Then, a user can query their federated database instance, knowing it will always query the analytics node and that their primary cluster won’t be impacted. The only way to make a change would be in the storage configuration of the federated database instance, which you can prevent, ensuring that no mistakes happen.
In addition to restricting the federated database instance to only read from the analytics node, the database manager can also place restrictions on the user to only read from that specific federated database instance. Now, not only do you have a connection string for your federated database instance that will never query your primary node, but you can also ensure that your users are assigned the correct roles, and they can’t accidentally connect to your cluster connection string.
By locking down an analytics node to read-only access, you can protect your most sensitive workloads and improve security while still sharing access to your most valuable data.

How to lock down a user to access the analytics node

The following steps will allow you to set your read-preferences in Atlas Data Federation to use the analytics node:
Step 1: Log into MongoDB Atlas.
Step 2: Select the Data Federation option on the left-hand navigation.
Step 3: Click “set up manually” in the "create new federated database" dropdown in the top right corner of the UI.
Step 4: Repeat this step for each of your data sources. Select the dataset for your federated database instance from the Data Sources section.
Atlas Data Federation UI showing where to select your data
4a. Select your cluster and collection.
  • Select your “Read Preference Mode.” Data Federation enables ‘nearest’ as its default.
4b. Click “Cluster Read Preference.”
  • Select your “Read Preference Mode.” Data Federation enables ‘nearest’ as its default.
  • Type in your TagSets. For example:
    • [ [ { "name": "nodeType", "value": "ANALYTICS" } ] ]
Data Federation UI showing the selection of a data source and an example of setting your read preferences and TagSets
4c. Select “Next.”
Step 5: Map your datasets from the Data Sources pane on the left to the Federated Database Instance pane on the right.
Step 6: Click “Save” to create the federated database instance.
*To connect to your federated database instance, continue to follow the instructions outlined in our documentation.
Note: If you have many databases and collections in your underlying cluster, you can use our “wildcard syntax” along with read preference to easily expose all your databases and collections from your cluster without enumerating each one. This can be set after you’ve configured read preference by going to the JSON editor view.
1"databases" : [
2 {
3 "name" : "*",
4 "collections" : [
5 {
6 "name" : "*",
7 "dataSources" : [
8 {
9 "storeName" : "<atlas-store-name>"
10 }
11 ]
12 }
13 ]
14 }
15]

How to manage database access in Atlas and assign roles to users

You must create a database user to access your deployment. For security purposes, Atlas requires clients to authenticate as MongoDB database users to access federated database instances. To add a database user to your cluster, perform the following steps:
Step 1: In the Security section of the left navigation, click “Database Access.”
1a. Make sure it shows the “Database Users” tab display.
1b. Click “+ Add New Database User.”
add a new database user to assign roles
Step 2: Select “Password” and enter user information.
how to add a new database user to assign roles
Step 3: Assign user privileges, such as read/write access.
3a. Select a built-in role from the “Built-in Role” dropdown menu. You can select one built-in role per database user within the Atlas UI. If you delete the default option, you can click “Add Built-in Role” to select a new built-in role.
3b. If you have any custom roles defined, you can expand the “Custom Roles” section and select one or more roles from the “Custom Roles” dropdown menu. Click “Add Custom Role” to add more custom roles. You can also click the “Custom Roles” link to see the custom roles for your project.
3c. Expand the “Specific Privileges” section and select one or more privileges from the “Specific Privileges” dropdown menu. Click “Add Specific Privilege” to add more privileges. This assigns the user specific privileges on individual databases and collections.
selecting user privileges to assign read/write access
Step 4: Optional: Specify the resources in the project that the user can access.
*By default, database users can access all the clusters and federated database instances in the project. You can restrict database users to have access to specific clusters and federated database instances by doing the following:
  • Toggle “Restrict Access to Specific Clusters/Federated Database Instances” to “ON.”
  • Select the clusters and federated database instances to grant the user access to from the “Grant Access To” list.
Step 5: Optional: Save as a temporary user.
Step 6: Click “Add User.”
restricting users access to specific clusters and federated database instances or saving a user temporarily for a specified duration
By following these steps, you can control access management using the analytics node with Atlas Data Federation. This can be a useful way to ensure that only authorized users have access to the analytics node, and that the data on the node is protected.
Overall, setting read preferences and using analytics nodes can help you to better manage access to your data and improve the performance and scalability of your application.
To learn more about Atlas Data Federation and whether it would be the right solution for you, check out our documentation and tutorials.

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Quick Start 2: Vector Search With MongoDB and OpenAI


May 06, 2024 | 12 min read
Article

Using SuperDuperDB to Accelerate AI Development on MongoDB Atlas Vector Search


Sep 18, 2024 | 6 min read
Tutorial

Building an Advanced RAG System With Self-Querying Retrieval


Sep 12, 2024 | 21 min read
Tutorial

Supercharge Your AI Applications: AWS Bedrock, MongoDB, and TypeScript


Oct 10, 2024 | 9 min read
Table of Contents