Docs Home → MongoDB Spark Connector
Write to MongoDB
To create a DataFrame, first create a SparkSession object, then use the object's createDataFrame()
function.
In the following example, createDataFrame()
takes
a list of tuples containing names and ages, and a list of column names:
people = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77), ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])
Write the people
DataFrame to the MongoDB database and collection
specified in the spark.mongodb.output.uri option
by using the write
method:
people.write.format("mongo").mode("append").save()
The above operation writes to the MongoDB database and collection
specified in the spark.mongodb.output.uri option
when you connect to the pyspark
shell.
To read the contents of the DataFrame, use the show()
method.
people.show()
In the pyspark
shell, the operation prints the following output:
+-------------+----+ | name| age| +-------------+----+ |Bilbo Baggins| 50| | Gandalf|1000| | Thorin| 195| | Balin| 178| | Kili| 77| | Dwalin| 169| | Oin| 167| | Gloin| 158| | Fili| 82| | Bombur|null| +-------------+----+
The printSchema()
method prints out the DataFrame's schema:
people.printSchema()
In the pyspark
shell, the operation prints the following output:
root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- age: long (nullable = true) |-- name: string (nullable = true)
If you need to write to a different MongoDB collection,
use the .option()
method with .write()
.
To write to a collection called contacts
in a database called
people
, specify people.contacts
in the output URI option.
people.write.format("mongo").mode("append").option("database", "people").option("collection", "contacts").save()