Frequently Asked Questions
On this page
- Is PyMongo Thread-Safe?
- Is PyMongo Fork-Safe?
- Can I Use PyMongo with Multiprocessing?
- Can PyMongo Load the Results of a Query as a Pandas DataFrame?
- How Does Connection Pooling Work in PyMongo?
- Why Does PyMongo Add an _id Field to All My Documents?
- How Do I Change the Timeout Value for Cursors?
- How Can I Store
Decimal
Instances? - Why Does PyMongo Convert
9.99
to9.9900000000000002
? - Does PyMongo Support Attribute-style Access for Documents?
- Does PyMongo Support Asynchronous Frameworks?
- Does PyMongo Work with mod_wsgi?
- Does PyMongo Work with PythonAnywhere?
- How Can I Encode My Documents to JSON?
- Does PyMongo Behave Differently in Python 3?
- Can I Share Pickled ObjectIds Between Python 2 and Python 3?
Is PyMongo Thread-Safe?
Yes. PyMongo is thread-safe and provides built-in connection pooling for threaded applications.
Is PyMongo Fork-Safe?
No. If you use the fork()
method to create a new process, don't pass an instance
of the MongoClient
class from the parent process to the child process. This creates
a high probability of deadlock among MongoClient
instances in the child process.
Instead, create a new MongoClient
instance in the child process.
Note
PyMongo tries to issue a warning if this deadlock might occur.
Can I Use PyMongo with Multiprocessing?
Yes. However, on Unix systems, the multiprocessing module spawns processes by using
the fork()
method. This carries the same risks described in Is PyMongo Fork-Safe?
To use multiprocessing with PyMongo, write code similar to the following example:
# Each process creates its own instance of MongoClient. def func(): db = pymongo.MongoClient().mydb # Do something with db. proc = multiprocessing.Process(target=func) proc.start()
Important
Do not copy an instance of the MongoClient
class from the parent process to a child
process.
Can PyMongo Load the Results of a Query as a Pandas DataFrame?
You can use the PyMongoArrow library to work with numerical or columnar data. PyMongoArrow lets you load MongoDB query result-sets as Pandas DataFrames, NumPy ndarrays, or Apache Arrow Tables.
How Does Connection Pooling Work in PyMongo?
Every MongoClient
instance has a built-in connection pool for each server
in your MongoDB topology. Connection pools open sockets on demand to
support concurrent requests to MongoDB in your application.
The maximum size of each connection pool is set by the maxPoolSize
option, which
defaults to 100
. If the number of in-use connections to a server reaches
the value of maxPoolSize
, the next request to that server will wait
until a connection becomes available.
In addition to the sockets needed to support your application's requests,
each MongoClient
instance opens two more sockets per server
in your MongoDB topology for monitoring the server's state.
For example, a client connected to a three-node replica set opens six
monitoring sockets. If the application uses the default setting for
maxPoolSize
and only queries the primary (default) node, then
there can be at most 106
total connections in the connection pool. If the
application uses a read preference to query the
secondary nodes, those connection pools grow and there can be
306
total connections.
To support high numbers of concurrent MongoDB requests
within one process, you can increase maxPoolSize
.
Connection pools are rate-limited. The maxConnecting
option
determines the number of connections that the pool can create in
parallel at any time. For example, if the value of maxConnecting
is
2
, the third request that attempts to concurrently check out a
connection succeeds only when one the following cases occurs:
The connection pool finishes creating a connection and there are fewer than
maxPoolSize
connections in the pool.An existing connection is checked back into the pool.
The driver's ability to reuse existing connections improves due to rate-limits on connection creation.
You can set the minimum number of concurrent connections to
each server with the minPoolSize
option, which defaults to 0
.
The driver initializes the connection pool with this number of sockets. If
sockets are closed, causing the total number
of sockets (both in use and idle) to drop below the minimum, more
sockets are opened until the minimum is reached.
You can set the maximum number of milliseconds that a connection can
remain idle in the pool by setting the maxIdleTimeMS
option.
Once a connection has been idle for maxIdleTimeMS
, the connection
pool removes and replaces it. This option defaults to 0
(no limit).
The following default configuration for a MongoClient
works for most
applications:
client = MongoClient(host, port)
MongoClient
supports multiple concurrent requests. For each process,
create a client and reuse it for all operations in a process. This
practice is more efficient than creating a client for each request.
The driver does not limit the number of requests that
can wait for sockets to become available, and it is the application's
responsibility to limit the size of its pool to bound queuing
during a load spike. Requests wait for the amount of time specified in
the waitQueueTimeoutMS
option, which defaults to 0
(no limit).
A request that waits more than the length of time defined by
waitQueueTimeoutMS
for a socket raises a ConnectionFailure
error. Use this
option if it is more important to bound the duration of operations
during a load spike than it is to complete every operation.
When MongoClient.close()
is called by any request, the driver
closes all idle sockets and closes all sockets that are in
use as they are returned to the pool. Calling MongoClient.close()
closes only inactive sockets, so you cannot interrupt or terminate
any ongoing operations by using this method. The driver closes these
sockets only when the process completes.
For more information, see the Connection Pool Overview in the MongoDB Server documentation.
Why Does PyMongo Add an _id Field to All My Documents?
When you use the Collection.insert_one()
method,
Collection.insert_many()
method, or
Collection.bulk_write()
method to insert a document into MongoDB,
and that document does not
include an _id
field, PyMongo automatically adds this field for you.
It also sets the value of the field to an instance of ObjectId
.
The following code example inserts a document without an _id
field into MongoDB, then
prints the document. After it's inserted, the document contains an _id
field whose
value is an instance of ObjectId
.
'x': 1} my_doc = { collection.insert_one(my_doc)InsertOneResult(ObjectId('560db337fba522189f171720'), acknowledged=True) my_doc{'x': 1, '_id': ObjectId('560db337fba522189f171720')}
PyMongo adds an _id
field in this manner for a few reasons:
All MongoDB documents must have an
_id
field.If PyMongo inserts a document without an
_id
field, MongoDB adds one itself, but doesn't report the value back to PyMongo for your application to use.Copying the document before adding the
_id
field is prohibitively expensive for most high-write-volume applications.
Tip
If you don't want PyMongo to add an _id
to your documents, insert only
documents that your application has already added an _id
field to.
How Do I Change the Timeout Value for Cursors?
MongoDB doesn't support custom timeouts for cursors, but you can turn off cursor
timeouts. To do so, pass the no_cursor_timeout=True
option to
the find()
method.
How Can I Store Decimal
Instances?
MongoDB v3.4 introduced the Decimal128
BSON type, a 128-bit decimal-based
floating-point value capable of emulating decimal rounding with exact precision.
PyMongo versions 3.4 and later also support this type.
Earlier MongoDB versions, however, support only IEEE 754 floating points, equivalent to the
Python float
type. PyMongo can store Decimal
instances to
these versions of MongoDB only by converting them to the float
type.
You must perform this conversion explicitly.
For more information, see the PyMongo API documentation for decimal128.
Why Does PyMongo Convert 9.99
to 9.9900000000000002
?
MongoDB represents 9.99
as an IEEE floating-point value, which can't
represent the value precisely. This is also true in some versions of
Python. In this regard, PyMongo behaves the same way as
the JavaScript shell, all other MongoDB drivers, and the Python language itself.
Does PyMongo Support Attribute-style Access for Documents?
No. PyMongo doesn't implement this feature, for the following reasons:
Adding attributes pollutes the attribute namespace for documents and could lead to subtle bugs or confusing errors when using a key with the same name as a dictionary method.
PyMongo uses SON objects instead of regular dictionaries only to maintain key ordering, because the server requires this for certain operations. Adding this feature would complicate the
SON
class and could break backwards compatibility if PyMongo ever reverts to using dictionaries.Documents behave just like dictionaries, which makes them relatively simple for new PyMongo users to understand. Changing the behavior of documents adds a barrier to entry for these users.
For more information, see the relevant Jira case.
Does PyMongo Support Asynchronous Frameworks?
Yes. For more information, see the Third-Party Tools guide.
Does PyMongo Work with mod_wsgi?
Yes. See mod_wsgi in the Tools guide.
Does PyMongo Work with PythonAnywhere?
No. PyMongo creates Python threads, which PythonAnywhere does not support.
For more information, see the relevant Jira ticket.
How Can I Encode My Documents to JSON?
PyMongo supports some special types, like ObjectId
and DBRef
, that aren't supported in JSON. Therefore, Python's json
module won't
work with all documents in PyMongo. Instead, PyMongo includes the
json_util
module, a tool for using Python's json
module with BSON documents and
MongoDB Extended JSON.
python-bsonjs is another
BSON-to-MongoDB-Extended-JSON converter, built on top of
libbson. python-bsonjs doesn't
depend on PyMongo and might offer a performance improvement over
json_util
in certain cases.
Tip
python-bsonjs works best with PyMongo when using the RawBSONDocument
type.
Does PyMongo Behave Differently in Python 3?
PyMongo encodes instances of the bytes
class
as BSON type 5 (binary data) with subtype 0.
In Python 2, these instances are decoded to Binary
with subtype 0. In Python 3, they are decoded back to bytes
.
The following code examples use PyMongo to insert a bytes
instance
into MongoDB, and then find the instance.
In Python 2, the byte string is decoded to Binary
.
In Python 3, the byte string is decoded back to bytes
.
import pymongo c = pymongo.MongoClient()'binary': b'this is a byte string'}).inserted_id c.test.bintest.insert_one({ObjectId('4f9086b1fba5222021000000') c.test.bintest.find_one(){u'binary': Binary('this is a byte string', 0), u'_id': ObjectId('4f9086b1fba5222021000000')}
import pymongo c = pymongo.MongoClient()'binary': b'this is a byte string'}).inserted_id c.test.bintest.insert_one({ObjectId('4f9086b1fba5222021000000') c.test.bintest.find_one(){'binary': b'this is a byte string', '_id': ObjectId('4f9086b1fba5222021000000')}
Similarly, Python 2 and 3 behave differently when PyMongo parses JSON binary
values with subtype 0. In Python 2, these values are decoded to instances of Binary
with subtype 0. In Python 3, they're decoded into instances of bytes
.
The following code examples use the json_util
module to decode a JSON binary value
with subtype 0. In Python 2, the byte string is decoded to Binary
.
In Python 3, the byte string is decoded back to bytes
.
from bson.json_util import loads '{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}') loads({u'b': Binary('this is a byte string', 0)}
from bson.json_util import loads '{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}') loads({'b': b'this is a byte string'}
Can I Share Pickled ObjectIds Between Python 2 and Python 3?
If you use Python 2 to pickle an instance of ObjectId
,
you can always unpickle it with Python 3. To do so, you must pass
the encoding='latin-1'
option to the pickle.loads()
method.
The following code example shows how to pickle an ObjectId
in Python 2.7, and then
unpickle it in Python 3.7:
# Python 2.7 import pickle from bson.objectid import ObjectId oid = ObjectId() oidObjectId('4f919ba2fba5225b84000000') pickle.dumps(oid)'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...' # Python 3.7 import pickle b'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...', encoding='latin-1') pickle.loads(ObjectId('4f919ba2fba5225b84000000')
If you pickled an ObjectID
in Python 2, and want to unpickle it in Python 3,
you must pass the protocol
argument with a value of 2
or less to the
pickle.dumps()
method.
The following code example shows how to pickle an ObjectId
in Python 3.7, and then
unpickle it in Python 2.7:
# Python 3.7 import pickle from bson.objectid import ObjectId oid = ObjectId() oidObjectId('4f96f20c430ee6bd06000000') 2) pickle.dumps(oid, protocol=b'\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...' # Python 2.7 import pickle '\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...') pickle.loads(ObjectId('4f96f20c430ee6bd06000000')