Docs Home → MongoDB Spark Connector
Read from MongoDB
This version of the documentation is archived and no longer supported. See the current documentation for the latest version of the MongoDB Connector for Spark.
You can create a Spark DataFrame to hold data from the MongoDB
collection specified in the
spark.mongodb.input.uri option which your
SparkSession
option is using.
Consider a collection named fruit
that contains the
following documents:
{ "_id" : 1, "type" : "apple", "qty" : 5 } { "_id" : 2, "type" : "orange", "qty" : 10 } { "_id" : 3, "type" : "banana", "qty" : 15 }
Assign the collection to a DataFrame with spark.read()
from within the pyspark
shell.
df = spark.read.format("mongo").load()
Spark samples the records to infer the schema of the collection.
df.printSchema()
The above operation produces the following shell output:
root |-- _id: double (nullable = true) |-- qty: double (nullable = true) |-- type: string (nullable = true)
If you need to read from a different MongoDB collection,
use the .option
method when reading data into a DataFrame.
To read from a collection called contacts
in a database called
people
, specify people.contacts
in the input URI option.
df = spark.read.format("mongo").option("uri", "mongodb://127.0.0.1/people.contacts").load()