To optimize the performance of your queries, Atlas Data Federation does the following:
For Cloud Object Storage, it uses data partitioning to select the files that it needs to process based on query parameters. To learn more, see Define Path File Syntax and Use Partition Attribute Types. Additionally, it uses Parquet metadata to reduce the amount of data it scans from parquet files using row group selection or column projection. To learn more, see Parquet Data Format.
For Atlas clusters, it tries to "push down" as much of the query to the cluster as possible. For example, if your aggregation pipeline has a
$matchstage and if it can be processed locally, Atlas Data Federation tries to process that stage in the Atlas cluster and only returns the resulting documents back to the federated layer for processing subsequent stages. To learn more, see Querying Data in Your Atlas Cluster.
To learn more, see Optimize Query Performance.
You can connect to Atlas Data Federation using MongoDB language-specific
drivers, mongosh, and Atlas SQL. To learn more, see Connect to Your Federated Database Instance.