How to Use the Union All Aggregation Pipeline Stage in MongoDB 4.4
Adrienne Tacke16 min read • Published Jan 31, 2022 • Updated Sep 09, 2024
Rate this tutorial
With the release of MongoDB 4.4 comes a new aggregation
pipeline
stage called
$unionWith
. This stage lets you combine multiple
collections into a single result set!Here's how you'd use it:
Simplified syntax, with no additional processing on the specified
collection
1 db.collection.aggregate([ 2 { $unionWith: "<anotherCollection>" } 3 ])
Extended syntax, using optional pipeline field
1 db.collection.aggregate([ 2 { $unionWith: { coll: "<anotherCollection>", pipeline: [ <stage1>, etc. ] } } 3 ])
⚠ If you use the pipeline field to process your collection before
combining, keep in mind that stages that write data, like
$out
and
$merge
, can't be used!Your resulting documents will merge your current collection's (or
pipeline's) stream of documents with the documents from the
collection/pipeline you specify. Keep in mind that this can include
duplicates!
If you've used the
UNION ALL
operation in SQL before, the $unionWith
stage's functionality may sound familiar to you, and you wouldn't be
wrong! Both combine the result sets from multiple queries and return the
merged rows, some of which may be duplicates. However, that's where the
similarities end. Unlike MongoDB's $unionWith
stage, you have to
follow a few
rules
in order to run a valid UNION ALL
operation in SQL:- Make sure your two queries have the same number of columns
- Make sure the order of columns are the same
- Make sure the matching columns are compatible data types.
It'd look something like this in SQL:
1 SELECT column1, expression1, column2 2 FROM table1 3 UNION ALL 4 SELECT column1, expression1, column2 5 FROM table2 6 WHERE [conditions]
With the
$unionWith
stage in MongoDB, you don't have to worry about
these stringent constraints.The most convenient difference between the
$unionWith
stage and other
UNION operations is that there's no matching schema restriction. This
flexible schema support means you can combine documents that may not
have the same type or number of fields. This is common in certain
scenarios, where the data we need to use comes from different sources:- TimeSeries data that's stored by month/quarter/some other unit of time
- IoT device data, per fleet or version
- Regional data
With MongoDB's
$unionWith
stage, combining these data sources is
possible.Ready to try the new
$unionWith
stage? Follow along by completing a
few setup steps first. Or, you can skip to the code
samples. 😉First, a general understanding of what the aggregation
framework
is and how to use it will be important for the rest of this tutorial. If
you are unfamiliar with the aggregation framework, check out this great
Introduction to the MongoDB Aggregation
Framework,
written by fellow dev advocate Ken Alger!
Next, based on your situation, you may already have a few prerequisites
setup or need to start from scratch. Either way, choose your scenario to
configure the things you need so that you can follow the rest of this
tutorial!
Choose your scenario:
I don't have an Atlas cluster set up yet:
- You'll need an Atlas account to play around with MongoDB Atlas! Create one if you haven't already done so. Otherwise, log into your Atlas account.
- Setup a free Atlas cluster (no credit card needed!). Be sure to select MongoDB 4.4 (may be Beta, which is OK) as your version in Additional Settings!💡 If you don't see the prompt to create a cluster: You may be prompted to create a project first before you see the prompt to create your first cluster. In this case, go ahead and create a project first (leaving all the default settings). Then continue with the instructions to deploy your first free cluster!
- Once your cluster is set up, add your IP address to your cluster's connection settings. This tells your cluster who's allowed to connect to it.
- Finally, create a database user for your cluster. Atlas requires anyone or anything accessing its clusters to authenticate as MongoDB database users for security purposes! Keep these credentials handy as you'll need them later on.
- Continue with the steps in Connecting to your cluster.
I have an Atlas cluster set up:
Great! You can skip ahead to Connecting to your cluster.
Connecting to your cluster
To connect to your cluster, we'll use the MongoDB for Visual Studio Code
extension (VS Code for short 😊). You can view your data directly,
interact with your collections, and much more with this helpful
extension! Using this also consolidates our workspace into a single
window, removing the need for us to jump back and forth between our code
and MongoDB Atlas!
💡 Though we'll be using the VS Code Extension and VS Code for the rest
of this tutorial, it's not a requirement to use the
$unionWith
pipeline stage! You can also use the
CLI, language-specific
drivers, or
Compass if you prefer!- Install the MongoDB for VS Code extension (or install VS Code first, if you don't already have it 😉).
- To connect to your cluster, you'll need a connection string. You can get this connection string from your cluster connection settings. Go to your cluster and select the "Connect" option:
- Select the "Connect using MongoDB Compass" option. This will give us a connection string in the DNS Seedlist Connection format that we can use with the MongoDB extension.💡 The MongoDB for VS Code extension also supports the standard connection string format. Using the DNS seedlist connection format is purely preference.
- Skip to the second step and copy the connection string (don't worry about the other settings, you won't need them):
- Switch back to VS Code. Press
Ctrl
+Shift
+P
(on Windows) orShift
+Command
+P
(on Mac) to bring up the command palette. This shows a list of all VS Code commands. - Start typing "MongoDB" until you see the MongoDB extension's list of available commands. Select the "MongoDB: Connect with Connection String" option.
- Paste in your copied connection string. 💡 Don't forget! You have to replace the placeholder password with your actual password!
- Press enter to connect! You'll know the connection was successful if you see a confirmation message on the bottom right. You'll also see your cluster listed when you expand the MongoDB extension pane.
With the MongoDB extension installed and your cluster connected, you can now use MongoDB Playgrounds to test out the
$unionWith
examples! MongoDB Playgrounds give us a nice sandbox to easily write and test Mongo queries. I love using it when prototying or trying something new because it has query auto-completion and syntax highlighting, something that you don't get in most terminals.Let's finally dive into some examples!
To follow along, you can use these MongoDB Playground
files I
have created to accompany this blog post or create your
own!
💡 If you create your own playground, remember to change the database
name and delete the default template's code first!
Right at the top, specify the database you'll be using. In this example,
I'm using a database also called
union-walkthrough
:1 use('union-walkthrough');
💡 I haven't actually created a database called
union-walkthrough
in
Atlas yet, but that's no problem! When the playground runs, it will see
that it does not yet exist and create a database of the specified name!Next, we need data! Particularly about some planets. And particularly
about planets in a certain movie series. 😉
Using the awesome SWAPI API, I've collected such
information on a few planets. Let's add them into two collections,
separated by popularity.
Any planets that appear in at least 2 or more films are considered
popular. Otherwise, we'll add them into the
lonely_planets
collection:1 // Insert a few documents into the lonely_planets collection. 2 db.lonely_planets.insertMany([ 3 { 4 "name": "Endor", 5 "rotation_period": "18", 6 "orbital_period": "402", 7 "diameter": "4900", 8 "climate": "temperate", 9 "gravity": "0.85 standard", 10 "terrain": "forests, mountains, lakes", 11 "surface_water": "8", 12 "population": "30000000", 13 "residents": [ 14 "http://swapi.dev/api/people/30/" 15 ], 16 "films": [ 17 "http://swapi.dev/api/films/3/" 18 ], 19 "created": "2014-12-10T11:50:29.349000Z", 20 "edited": "2014-12-20T20:58:18.429000Z", 21 "url": "http://swapi.dev/api/planets/7/" 22 }, 23 { 24 "name": "Kamino", 25 "rotation_period": "27", 26 "orbital_period": "463", 27 "diameter": "19720", 28 "climate": "temperate", 29 "gravity": "1 standard", 30 "terrain": "ocean", 31 "surface_water": "100", 32 "population": "1000000000", 33 "residents": [ 34 "http://swapi.dev/api/people/22/", 35 "http://swapi.dev/api/people/72/", 36 "http://swapi.dev/api/people/73/" 37 ], 38 "films": [ 39 "http://swapi.dev/api/films/5/" 40 ], 41 "created": "2014-12-10T12:45:06.577000Z", 42 "edited": "2014-12-20T20:58:18.434000Z", 43 "url": "http://swapi.dev/api/planets/10/" 44 }, 45 { 46 "name": "Yavin IV", 47 "rotation_period": "24", 48 "orbital_period": "4818", 49 "diameter": "10200", 50 "climate": "temperate, tropical", 51 "gravity": "1 standard", 52 "terrain": "jungle, rainforests", 53 "surface_water": "8", 54 "population": "1000", 55 "residents": [], 56 "films": [ 57 "http://swapi.dev/api/films/1/" 58 ], 59 "created": "2014-12-10T11:37:19.144000Z", 60 "edited": "2014-12-20T20:58:18.421000Z", 61 "url": "http://swapi.dev/api/planets/3/" 62 }, 63 { 64 "name": "Hoth", 65 "rotation_period": "23", 66 "orbital_period": "549", 67 "diameter": "7200", 68 "climate": "frozen", 69 "gravity": "1.1 standard", 70 "terrain": "tundra, ice caves, mountain ranges", 71 "surface_water": "100", 72 "population": "unknown", 73 "residents": [], 74 "films": [ 75 "http://swapi.dev/api/films/2/" 76 ], 77 "created": "2014-12-10T11:39:13.934000Z", 78 "edited": "2014-12-20T20:58:18.423000Z", 79 "url": "http://swapi.dev/api/planets/4/" 80 }, 81 { 82 "name": "Bespin", 83 "rotation_period": "12", 84 "orbital_period": "5110", 85 "diameter": "118000", 86 "climate": "temperate", 87 "gravity": "1.5 (surface), 1 standard (Cloud City)", 88 "terrain": "gas giant", 89 "surface_water": "0", 90 "population": "6000000", 91 "residents": [ 92 "http://swapi.dev/api/people/26/" 93 ], 94 "films": [ 95 "http://swapi.dev/api/films/2/" 96 ], 97 "created": "2014-12-10T11:43:55.240000Z", 98 "edited": "2014-12-20T20:58:18.427000Z", 99 "url": "http://swapi.dev/api/planets/6/" 100 } 101 ]); 102 103 // Insert a few documents into the popular_planets collection. 104 db.popular_planets.insertMany([ 105 { 106 "name": "Tatooine", 107 "rotation_period": "23", 108 "orbital_period": "304", 109 "diameter": "10465", 110 "climate": "arid", 111 "gravity": "1 standard", 112 "terrain": "desert", 113 "surface_water": "1", 114 "population": "200000", 115 "residents": [ 116 "http://swapi.dev/api/people/1/", 117 "http://swapi.dev/api/people/2/", 118 "http://swapi.dev/api/people/4/", 119 "http://swapi.dev/api/people/6/", 120 "http://swapi.dev/api/people/7/", 121 "http://swapi.dev/api/people/8/", 122 "http://swapi.dev/api/people/9/", 123 "http://swapi.dev/api/people/11/", 124 "http://swapi.dev/api/people/43/", 125 "http://swapi.dev/api/people/62/" 126 ], 127 "films": [ 128 "http://swapi.dev/api/films/1/", 129 "http://swapi.dev/api/films/3/", 130 "http://swapi.dev/api/films/4/", 131 "http://swapi.dev/api/films/5/", 132 "http://swapi.dev/api/films/6/" 133 ], 134 "created": "2014-12-09T13:50:49.641000Z", 135 "edited": "2014-12-20T20:58:18.411000Z", 136 "url": "http://swapi.dev/api/planets/1/" 137 }, 138 { 139 "name": "Alderaan", 140 "rotation_period": "24", 141 "orbital_period": "364", 142 "diameter": "12500", 143 "climate": "temperate", 144 "gravity": "1 standard", 145 "terrain": "grasslands, mountains", 146 "surface_water": "40", 147 "population": "2000000000", 148 "residents": [ 149 "http://swapi.dev/api/people/5/", 150 "http://swapi.dev/api/people/68/", 151 "http://swapi.dev/api/people/81/" 152 ], 153 "films": [ 154 "http://swapi.dev/api/films/1/", 155 "http://swapi.dev/api/films/6/" 156 ], 157 "created": "2014-12-10T11:35:48.479000Z", 158 "edited": "2014-12-20T20:58:18.420000Z", 159 "url": "http://swapi.dev/api/planets/2/" 160 }, 161 { 162 "name": "Naboo", 163 "rotation_period": "26", 164 "orbital_period": "312", 165 "diameter": "12120", 166 "climate": "temperate", 167 "gravity": "1 standard", 168 "terrain": "grassy hills, swamps, forests, mountains", 169 "surface_water": "12", 170 "population": "4500000000", 171 "residents": [ 172 "http://swapi.dev/api/people/3/", 173 "http://swapi.dev/api/people/21/", 174 "http://swapi.dev/api/people/35/", 175 "http://swapi.dev/api/people/36/", 176 "http://swapi.dev/api/people/37/", 177 "http://swapi.dev/api/people/38/", 178 "http://swapi.dev/api/people/39/", 179 "http://swapi.dev/api/people/42/", 180 "http://swapi.dev/api/people/60/", 181 "http://swapi.dev/api/people/61/", 182 "http://swapi.dev/api/people/66/" 183 ], 184 "films": [ 185 "http://swapi.dev/api/films/3/", 186 "http://swapi.dev/api/films/4/", 187 "http://swapi.dev/api/films/5/", 188 "http://swapi.dev/api/films/6/" 189 ], 190 "created": "2014-12-10T11:52:31.066000Z", 191 "edited": "2014-12-20T20:58:18.430000Z", 192 "url": "http://swapi.dev/api/planets/8/" 193 }, 194 { 195 "name": "Coruscant", 196 "rotation_period": "24", 197 "orbital_period": "368", 198 "diameter": "12240", 199 "climate": "temperate", 200 "gravity": "1 standard", 201 "terrain": "cityscape, mountains", 202 "surface_water": "unknown", 203 "population": "1000000000000", 204 "residents": [ 205 "http://swapi.dev/api/people/34/", 206 "http://swapi.dev/api/people/55/", 207 "http://swapi.dev/api/people/74/" 208 ], 209 "films": [ 210 "http://swapi.dev/api/films/3/", 211 "http://swapi.dev/api/films/4/", 212 "http://swapi.dev/api/films/5/", 213 "http://swapi.dev/api/films/6/" 214 ], 215 "created": "2014-12-10T11:54:13.921000Z", 216 "edited": "2014-12-20T20:58:18.432000Z", 217 "url": "http://swapi.dev/api/planets/9/" 218 }, 219 { 220 "name": "Dagobah", 221 "rotation_period": "23", 222 "orbital_period": "341", 223 "diameter": "8900", 224 "climate": "murky", 225 "gravity": "N/A", 226 "terrain": "swamp, jungles", 227 "surface_water": "8", 228 "population": "unknown", 229 "residents": [], 230 "films": [ 231 "http://swapi.dev/api/films/2/", 232 "http://swapi.dev/api/films/3/", 233 "http://swapi.dev/api/films/6/" 234 ], 235 "created": "2014-12-10T11:42:22.590000Z", 236 "edited": "2014-12-20T20:58:18.425000Z", 237 "url": "http://swapi.dev/api/planets/5/" 238 } 239 ]);
This separation is indicative of how our data may be grouped. Despite
the separation, we can use the
$unionWith
stage to combine these two
collections if we ever needed to analyze them as a single result set!Let's say that we needed to find out the total population of planets,
grouped by climate. Additionally, we'd like to leave out any planets
that don't have population data from our calculation. We can do this
using an aggregation:
1 // Run an aggregation to view total planet populations, grouped by climate type. 2 use('union-walkthrough'); 3 4 db.lonely_planets.aggregate([ 5 { 6 $match: { 7 population: { $ne: 'unknown' } 8 } 9 }, 10 { 11 $unionWith: { 12 coll: 'popular_planets', 13 pipeline: [{ 14 $match: { 15 population: { $ne: 'unknown' } 16 } 17 }] 18 } 19 }, 20 { 21 $group: { 22 _id: '$climate', totalPopulation: { $sum: { $toLong: '$population' } } 23 } 24 } 25 ]);
If you've followed along in your own MongoDB playground and have copied
the code so far, try running the aggregation!
And if you're using the provided MongoDB playground I created, highlight
lines 264 - 290 and then run the selected code.
💡 You'll notice in the code snippet above that I've added another
use('union-walkthrough');
method right above the aggregation code. I
do this to make the selection of relevant code within the playground
easier. It's also required so that the aggregation code can run against
the correct database. However, the same thing can be achieved by
selecting multiple lines, namely the original use('union-walkthrough')
line at the top and whatever additional example you'd like to run!You should see the results like so:
1 [ 2 { 3 _id: 'arid', 4 totalPopulation: 200000 5 }, 6 { 7 _id: 'temperate', 8 totalPopulation: 1007536000000 9 }, 10 { 11 _id: 'temperate, tropical', 12 totalPopulation: 1000 13 } 14 ]
Unsurprisingly, planets with "temperate" climates seem to have more
inhabitants. Something about that cool 75 F / 23.8 C, I guess 🌞
Let's break down this aggregation:
The first object we pass into our aggregation is also our first stage,
used here as our filter criteria. Specifically, we use the
$match
pipeline stage:
1 { 2 $match: { 3 population: { $ne: 'unknown' } 4 } 5 },
In this example, we filter out any documents that have
unknown
as
their population
value using the
$ne (not
equal) operator.The next object (and next stage) in our aggregation is our
$unionWith
stage. Here, we specifiy what collection we'd like to perform a union
with (including any duplicates). We also make use of the pipeline field
to similarly filter out any documents in our popular_planets
collection that have an unknown population:1 { 2 $unionWith: { 3 coll: 'popular_planets', 4 pipeline: [ 5 { 6 $match: { 7 population: { $ne: 'unknown' } 8 } 9 } 10 ] 11 } 12 },
Finally, we have our last stage in our aggregation. After combining our
lonely_planets
and popular_planets
collections (both filtering out
documents with no population data), we group the resulting documents
using a
$group
stage:1 { 2 $group: { 3 _id: '$climate', 4 totalPopulation: { $sum: { $toLong: '$population' } } 5 } 6 }
Since we want to know the total population per climate type, we first
specify
_id
to be the $climate
field from our combined result set.
Then, we calculate a new field called totalPopulation
by using a
$sum
operator to add each matching document's population values together.
You'll also notice that based on the data we have, we needed to use a
$toLong
operator to first convert our $population
field into a calculable
value!Now, if you don't need to run some additional processing on the
collection you're combining with, you don't have to! The
pipeline
field is optional and is only there if you need it.So, if you just need to work with the planet data as a unified set, you
can do that too:
1 // Run an aggregation with no pipeline 2 use('union-walkthrough'); 3 4 db.lonely_planets.aggregate([ 5 { $unionWith: 'popular_planets' } 6 ]);
Copy this aggregation into your own playground and run it!
Alternatively, select and run lines 293 - 297 if using the provided
MongoDB playground!
Tada! Now you can use this unified dataset for analysis or further
processing.
Combining the same schemas is great, but we can do that in regular SQL
too! The real convenience of the
$unionWith
pipeline stage is that it
can also combine collections with different schemas. Let's take a look!As before, we'll specifiy the database we want to use:
1 use('union-walkthrough');
This time, we'll use some acquired information about certain starships
and vehicles that are used in this same movie series. Let's add them to
their respective collections:
1 // Insert a few documents into the starships collection 2 db.starships.insertMany([ 3 { 4 "name": "Death Star", 5 "model": "DS-1 Orbital Battle Station", 6 "manufacturer": "Imperial Department of Military Research, Sienar Fleet Systems", 7 "cost_in_credits": "1000000000000", 8 "length": "120000", 9 "max_atmosphering_speed": "n/a", 10 "crew": 342953, 11 "passengers": 843342, 12 "cargo_capacity": "1000000000000", 13 "consumables": "3 years", 14 "hyperdrive_rating": 4.0, 15 "MGLT": 10, 16 "starship_class": "Deep Space Mobile Battlestation", 17 "pilots": [] 18 }, 19 { 20 "name": "Millennium Falcon", 21 "model": "YT-1300 light freighter", 22 "manufacturer": "Corellian Engineering Corporation", 23 "cost_in_credits": "100000", 24 "length": "34.37", 25 "max_atmosphering_speed": "1050", 26 "crew": 4, 27 "passengers": 6, 28 "cargo_capacity": 100000, 29 "consumables": "2 months", 30 "hyperdrive_rating": 0.5, 31 "MGLT": 75, 32 "starship_class": "Light freighter", 33 "pilots": [ 34 "http://swapi.dev/api/people/13/", 35 "http://swapi.dev/api/people/14/", 36 "http://swapi.dev/api/people/25/", 37 "http://swapi.dev/api/people/31/" 38 ] 39 }, 40 { 41 "name": "Y-wing", 42 "model": "BTL Y-wing", 43 "manufacturer": "Koensayr Manufacturing", 44 "cost_in_credits": "134999", 45 "length": "14", 46 "max_atmosphering_speed": "1000km", 47 "crew": 2, 48 "passengers": 0, 49 "cargo_capacity": 110, 50 "consumables": "1 week", 51 "hyperdrive_rating": 1.0, 52 "MGLT": 80, 53 "starship_class": "assault starfighter", 54 "pilots": [] 55 }, 56 { 57 "name": "X-wing", 58 "model": "T-65 X-wing", 59 "manufacturer": "Incom Corporation", 60 "cost_in_credits": "149999", 61 "length": "12.5", 62 "max_atmosphering_speed": "1050", 63 "crew": 1, 64 "passengers": 0, 65 "cargo_capacity": 110, 66 "consumables": "1 week", 67 "hyperdrive_rating": 1.0, 68 "MGLT": 100, 69 "starship_class": "Starfighter", 70 "pilots": [ 71 "http://swapi.dev/api/people/1/", 72 "http://swapi.dev/api/people/9/", 73 "http://swapi.dev/api/people/18/", 74 "http://swapi.dev/api/people/19/" 75 ] 76 }, 77 ]); 78 79 // Insert a few documents into the vehicles collection 80 db.vehicles.insertMany([ 81 { 82 "name": "Sand Crawler", 83 "model": "Digger Crawler", 84 "manufacturer": "Corellia Mining Corporation", 85 "cost_in_credits": "150000", 86 "length": "36.8 ", 87 "max_atmosphering_speed": 30, 88 "crew": 46, 89 "passengers": 30, 90 "cargo_capacity": 50000, 91 "consumables": "2 months", 92 "vehicle_class": "wheeled", 93 "pilots": [] 94 }, 95 { 96 "name": "X-34 landspeeder", 97 "model": "X-34 landspeeder", 98 "manufacturer": "SoroSuub Corporation", 99 "cost_in_credits": "10550", 100 "length": "3.4 ", 101 "max_atmosphering_speed": 250, 102 "crew": 1, 103 "passengers": 1, 104 "cargo_capacity": 5, 105 "consumables": "unknown", 106 "vehicle_class": "repulsorcraft", 107 "pilots": [], 108 }, 109 { 110 "name": "AT-AT", 111 "model": "All Terrain Armored Transport", 112 "manufacturer": "Kuat Drive Yards, Imperial Department of Military Research", 113 "cost_in_credits": "unknown", 114 "length": "20", 115 "max_atmosphering_speed": 60, 116 "crew": 5, 117 "passengers": 40, 118 "cargo_capacity": 1000, 119 "consumables": "unknown", 120 "vehicle_class": "assault walker", 121 "pilots": [], 122 "films": [ 123 "http://swapi.dev/api/films/2/", 124 "http://swapi.dev/api/films/3/" 125 ], 126 "created": "2014-12-15T12:38:25.937000Z", 127 "edited": "2014-12-20T21:30:21.677000Z", 128 "url": "http://swapi.dev/api/vehicles/18/" 129 }, 130 { 131 "name": "AT-ST", 132 "model": "All Terrain Scout Transport", 133 "manufacturer": "Kuat Drive Yards, Imperial Department of Military Research", 134 "cost_in_credits": "unknown", 135 "length": "2", 136 "max_atmosphering_speed": 90, 137 "crew": 2, 138 "passengers": 0, 139 "cargo_capacity": 200, 140 "consumables": "none", 141 "vehicle_class": "walker", 142 "pilots": [ 143 "http://swapi.dev/api/people/13/" 144 ] 145 }, 146 { 147 "name": "Storm IV Twin-Pod cloud car", 148 "model": "Storm IV Twin-Pod", 149 "manufacturer": "Bespin Motors", 150 "cost_in_credits": "75000", 151 "length": "7", 152 "max_atmosphering_speed": 1500, 153 "crew": 2, 154 "passengers": 0, 155 "cargo_capacity": 10, 156 "consumables": "1 day", 157 "vehicle_class": "repulsorcraft", 158 "pilots": [], 159 } 160 ]);
You may be thinking (as I first did), what's the difference between
starships and vehicles? You'll be pleased to know that starships are
defined as any "single transport craft that has hyperdrive capability".
Any other single transport craft that does not have hyperdrive
capability is considered a vehicle. The more you know! 😮
If you look at the two collections, you'll see that they have two key
differences:
- The
max_atmosphering_speed
field is present in both collections, but is astring
in thestarships
collection and anint
in thevehicles
collection. - The
starships
collection has two fields (hyperdrive_rating
,MGLT
) that are not present in thevehicles
collection, as it only relates to starships.
But you know what? That's not a problem for the
$unionWith
stage! You
can combine them just as before:1 // Run an aggregation with no pipeline and differing schemas 2 use('union-walkthrough'); 3 4 db.starships.aggregate([ 5 { $unionWith: 'vehicles' } 6 ]);
Try running the aggregation in your playground! Or if you're following
along in the MongoDB playground I've provided, select and run lines
185 - 189! You should get the following combined result set as your
output:
1 [ 2 { 3 _id: 5f306ddca3ee8339643f137e, 4 name: 'Death Star', 5 model: 'DS-1 Orbital Battle Station', 6 manufacturer: 'Imperial Department of Military Research, Sienar Fleet Systems', 7 cost_in_credits: '1000000000000', 8 length: '120000', 9 max_atmosphering_speed: 'n/a', 10 crew: 342953, 11 passengers: 843342, 12 cargo_capacity: '1000000000000', 13 consumables: '3 years', 14 hyperdrive_rating: 4, 15 MGLT: 10, 16 starship_class: 'Deep Space Mobile Battlestation', 17 pilots: [] 18 }, 19 { 20 _id: 5f306ddca3ee8339643f137f, 21 name: 'Millennium Falcon', 22 model: 'YT-1300 light freighter', 23 manufacturer: 'Corellian Engineering Corporation', 24 cost_in_credits: '100000', 25 length: '34.37', 26 max_atmosphering_speed: '1050', 27 crew: 4, 28 passengers: 6, 29 cargo_capacity: 100000, 30 consumables: '2 months', 31 hyperdrive_rating: 0.5, 32 MGLT: 75, 33 starship_class: 'Light freighter', 34 pilots: [ 35 'http://swapi.dev/api/people/13/', 36 'http://swapi.dev/api/people/14/', 37 'http://swapi.dev/api/people/25/', 38 'http://swapi.dev/api/people/31/' 39 ] 40 }, 41 // + 7 other results, omitted for brevity 42 ]
Can you imagine doing that in SQL? Hint: You can't! That kind of schema
restriction is something you don't need to worry about with MongoDB,
though!
So we can combine different schemas no problem. What if we need to do a
little extra work on our collection before combining it? That's where
the
pipeline
field comes in!Let's say that there's some classified information in our data about the
vehicles. Namely, any vehicles manufactured by Kuat Drive Yards (AKA a
division of the Imperial Department of Military Research).
By direct orders, you are instructed not to give out this information
under any circumstances. In fact, you need to intercept any requests for
vehicle information and remove these classified vehicles from the list!
We can do that like so:
1 use('union-walkthrough'); 2 3 db.starships.aggregate([ 4 { 5 $unionWith: { 6 coll: 'vehicles', 7 pipeline: [ 8 { 9 $redact: { 10 $cond: { 11 if: { $eq: [ "$manufacturer", "Kuat Drive Yards, Imperial Department of Military Research"] }, 12 then: "$$PRUNE", 13 else: "$$DESCEND" 14 } 15 } 16 } 17 ] 18 } 19 } 20 ]);
In this example, we're combining the
starships
and vehicles
collections as before, using the $unionWith
pipeline stage. We also
process the vehicle
data a bit more, using the $unionWith
's optional
pipeline
field:1 // Pipeline used with the vehicle collection 2 { 3 $redact: { 4 $cond: { 5 if: { $eq: [ "$manufacturer", "Kuat Drive Yards, Imperial Department of Military Research"] }, 6 then: "$$PRUNE", 7 else: "$$DESCEND" 8 } 9 } 10 }
In our case, we are evaluating whether or not the
manufacturer
field
holds a value of "Kuat Drive Yards, Imperial Department of Military
Research". If it does (uh oh, that's classified!), we use a system
variable called
$$PRUNE,
which lets us exclude all fields at the current document/embedded
document level. If it doesn't, we use another system variable called
$$DESCEND,
which will return all fields at the current document level, except for
any embedded documents.This works perfectly for our use case. Try running the aggregation
(lines 192 - 211, if using the provided MongoDB Playground). You should
see a combined result set, minus any Imperial manufactured vehicles:
1 [ 2 { 3 _id: 5f306ddca3ee8339643f137e, 4 name: 'Death Star', 5 model: 'DS-1 Orbital Battle Station', 6 manufacturer: 'Imperial Department of Military Research, Sienar Fleet Systems', 7 cost_in_credits: '1000000000000', 8 length: '120000', 9 max_atmosphering_speed: 'n/a', 10 crew: 342953, 11 passengers: 843342, 12 cargo_capacity: '1000000000000', 13 consumables: '3 years', 14 hyperdrive_rating: 4, 15 MGLT: 10, 16 starship_class: 'Deep Space Mobile Battlestation', 17 pilots: [] 18 }, 19 { 20 _id: 5f306ddda3ee8339643f1383, 21 name: 'X-34 landspeeder', 22 model: 'X-34 landspeeder', 23 manufacturer: 'SoroSuub Corporation', 24 cost_in_credits: '10550', 25 length: '3.4 ', 26 max_atmosphering_speed: 250, 27 crew: 1, 28 passengers: 1, 29 cargo_capacity: 5, 30 consumables: 'unknown', 31 vehicle_class: 'repulsorcraft', 32 pilots: [] 33 }, 34 // + 5 more non-Imperial manufactured results, omitted for brevity 35 ]
We did our part to restrict classified information! 🎶 Hums Imperial
March 🎶
Now that we know how the
$unionWith
stage works, it's important to
discuss its limits and restrictions.We've mentioned it already, but it's important to reiterate: using the
$unionWith
stage will give you a combined result set which may include
duplicates! This is equivalent to how the UNION ALL
operator works in
SQL
as well. As a workaround, using a $group
stage at the end of
your pipeline to remove duplicates is advised, but only when possible
and if the resulting data does not get inaccurately skewed.There are plans to add similar fuctionality to
UNION
(which combines
result sets but removes duplicates), but that may be in a future
release.If you use a
$unionWith
stage as part of a
$lookup
pipeline, the collection you specify for the $unionWith
cannot be
sharded. As an example, take a look at this aggregation:1 // Invalid aggregation (tried to use sharded collection with $unionWith) 2 db.lonely_planets.aggregate([ 3 { 4 $lookup: { 5 from: "extinct_planets", 6 let: { last_known_population: "$population", years_extinct: "$time_extinct" }, 7 pipeline: [ 8 // Filter criteria 9 { $unionWith: { coll: "questionable_planets", pipeline: [ { pipeline } ] } }, 10 // Other pipeline stages 11 ], 12 as: "planetdata" 13 } 14 } 15 ])
The coll
questionable_planets
(located within the $unionWith
stage)
cannot be sharded. This is enforced to prevent a significant decrease in
performance due to the shuffling of data around the cluster as it
determines the best execution plan.Aggregation pipelines can't use the
$unionWith
stage inside
transactions because a rare, but possible 3-thread deadlock can occur in
very niche scenarios. Additionally, in MongoDB 4.4, there is a
first-time definition of a view that would restrict its reading from
within a transaction.The
$out
and
$merge
stages cannot be used in a
$unionWith
pipeline. Since both $out
and
$merge
are stages that write data to a collection, they need to be
the last stage in a pipeline. This conflicts with the usage of the
$unionWith
stage as it outputs its combined result set onto the next
stage, which can be used at any point in an aggregation pipeline.If your aggregation includes a
collation,
that collation is used for the operation, ignoring any other collations.
However, if your aggregation doesn't include a collation, it will use
the collation for the top-level collection/view on which the aggregation
is run:
- If the
$unionWith
coll is a collection, its collation is ignored. - If the
$unionWith
coll is a view, then its collation must match that of the top-level collection/view. Otherwise, the operation errors.
We've discussed what the
$unionWith
pipeline stage is and how you can
use it in your aggregations to combine data from multiple collections.
Though similar to SQL's UNION ALL
operation, MongoDB's $unionWith
stage distinguishes itself through some convenient and much-needed
characteristics. Most notable is the ability to combine collections with
different schemas! And as a much needed improvement, using a
$unionWith
stage eliminates the need to write additional code, code
that was required because we had no other way to combine our data!If you have any questions about the
$unionWith
pipeline stage or this
blog post, head over to the MongoDB Community
forums or Tweet
me!