Docs Menu
Docs Home
/ / /
PyMongo
/

Custom Types

On this page

  • Overview
  • Encode a Custom Type
  • Define a Type Codec Class
  • Add Codec to the Type Registry
  • Get a Collection Reference
  • Encode a Subtype
  • Define a Fallback Encoder
  • Encode Unknown Types
  • Limitations
  • API Documentation

This guide explains how to use PyMongo to encode and decode custom types.

You might need to define a custom type if you want to store a data type that the driver doesn't understand. For example, the BSON library's Decimal128 type is distinct from Python's built-in Decimal type. Attempting to save an instance of Decimal with PyMongo results in an InvalidDocument exception, as shown in the following code example:

from decimal import Decimal
num = Decimal("45.321")
db["coll"].insert_one({"num": num})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of
type: <class 'decimal.Decimal'>

The following sections show how to define a custom type for this Decimal type.

To encode a custom type, you must first define a type codec. A type codec describes how an instance of a custom type is converted to and from a type that the bson module can already encode.

When you define a type codec, your class must inherit from one of the base classes in the codec_options module. The following table describes these base classes, and when and how to implement them:

Base Class
When to Use
Members to Implement
codec_options.TypeEncoder
Inherit from this class to define a codec that encodes a custom Python type to a known BSON type.
  • python_type attribute: The custom Python type that's encoded by this type codec

  • transform_python() method: Function that transforms a custom type value into a type that BSON can encode

codec_options.TypeDecoder
Inherit from this class to define a codec that decodes a specified BSON type into a custom Python type.
  • bson_type attribute: The BSON type that's decoded by this type codec

  • transform_bson() method: Function that transforms a standard BSON type value into the custom type

codec_options.TypeCodec
Inherit from this class to define a codec that can both encode and decode a custom type.
  • python_type attribute: The custom Python type that's encoded by this type codec

  • bson_type attribute: The BSON type that's decoded by this type codec

  • transform_bson() method: Function that transforms a standard BSON type value into the custom type

  • transform_python() method: Function that transforms a custom type value into a type that BSON can encode

Because the example Decimal custom type can be converted to and from a Decimal128 instance, you must define how to encode and decode this type. Therefore, the Decimal type codec class must inherit from the TypeCodec base class:

from bson.decimal128 import Decimal128
from bson.codec_options import TypeCodec
class DecimalCodec(TypeCodec):
python_type = Decimal
bson_type = Decimal128
def transform_python(self, value):
return Decimal128(value)
def transform_bson(self, value):
return value.to_decimal()

After defining a custom type codec, you must add it to PyMongo's type registry, the list of types the driver can encode and decode. To do so, create an instance of the TypeRegistry class, passing in an instance of your type codec class inside a list. If you create multiple custom codecs, you can pass them all to the TypeRegistry constructor.

The following code examples adds an instance of the DecimalCodec type codec to the type registry:

from bson.codec_options import TypeRegistry
decimal_codec = DecimalCodec()
type_registry = TypeRegistry([decimal_codec])

Note

Once instantiated, registries are immutable and the only way to add codecs to a registry is to create a new one.

Finally, define a codec_options.CodecOptions instance, passing your TypeRegistry object as a keyword argument. Pass your CodecOptions object to the get_collection() method to obtain a collection that can use your custom type:

from bson.codec_options import CodecOptions
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection("test", codec_options=codec_options)

You can then encode and decode instances of the Decimal class:

import pprint
collection.insert_one({"num": Decimal("45.321")})
my_doc = collection.find_one()
pprint.pprint(my_doc)
{'_id': ObjectId('...'), 'num': Decimal('45.321')}

To see how MongoDB stores an instance of the custom type, create a new collection object without the customized codec options, then use it to retrieve the document containing the custom type. The following example shows that PyMongo stores an instance of the Decimal class as a Decimal128 value:

import pprint
new_collection = db.get_collection("test")
pprint.pprint(new_collection.find_one())
{'_id': ObjectId('...'), 'num': Decimal128('45.321')}

You might also need to encode one or more types that inherit from your custom type. Consider the following subtype of the Decimal class, which contains a method to return its value as an integer:

class DecimalInt(Decimal):
def get_int(self):
return int(self)

If you try to save an instance of the DecimalInt class without first registering a type codec for it, PyMongo raises an error:

collection.insert_one({"num": DecimalInt("45.321")})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'),
of type: <class 'decimal.Decimal'>

To encode an instance of the DecimalInt class, you must define a type codec for the class. This type codec must inherit from the parent class's codec, DecimalCodec, as shown in the following example:

class DecimalIntCodec(DecimalCodec):
@property
def python_type(self):
# The Python type encoded by this type codec
return DecimalInt

You can then add the sublcass's type codec to the type registry and encode instances of the custom type:

import pprint
from bson.codec_options import CodecOptions
decimal_int_codec = DecimalIntCodec()
type_registry = TypeRegistry([decimal_codec, decimal_int_codec])
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection("test", codec_options=codec_options)
collection.insert_one({"num": DecimalInt("45.321")})
my_doc = collection.find_one()
pprint.pprint(my_doc)
{'_id': ObjectId('...'), 'num': Decimal('45.321')}

Note

The transform_bson() method of the DecimalCodec class results in these values being decoded as Decimal, not DecimalInt.

You can also register a fallback encoder, a callable to encode types not recognized by BSON and for which no type codec has been registered. The fallback encoder accepts an unencodable value as a parameter and returns a BSON-encodable value.

The following fallback encoder encodes Python's Decimal type to a Decimal128:

def fallback_encoder(value):
if isinstance(value, Decimal):
return Decimal128(value)
return value

After declaring a fallback encoder, perform the following steps:

  • Construct a new instance of the TypeRegistry class. Use the fallback_encoder keyword argument to pass in the fallback encoder.

  • Construct a new instance of the CodecOptions class. Use the type_registry keyword argument to pass in the TypeRegistry instance.

  • Call the get_collection() method. Use the codec_options keyword argument to pass in the CodecOptions instance.

The following code example shows this process:

type_registry = TypeRegistry(fallback_encoder=fallback_encoder)
codec_options = CodecOptions(type_registry=type_registry)
collection = db.get_collection("test", codec_options=codec_options)

You can then use this reference to a collection to store instances of the Decimal class:

import pprint
collection.insert_one({"num": Decimal("45.321")})
my_doc = collection.find_one()
pprint.pprint(my_doc)
{'_id': ObjectId('...'), 'num': Decimal128('45.321')}

Note

Fallback encoders are invoked after attempts to encode the given value with standard BSON encoders and any configured type encoders have failed. Therefore, in a type registry configured with a type encoder and fallback encoder that both target the same custom type, the behavior specified in the type encoder takes precedence.

Because fallback encoders don't need to declare the types that they encode beforehand, you can use them in cases where a TypeEncoder doesn't work. For example, you can use a fallback encoder to save arbitrary objects to MongoDB. Consider the following arbitrary custom types:

class MyStringType(object):
def __init__(self, value):
self.__value = value
def __repr__(self):
return "MyStringType('%s')" % (self.__value,)
class MyNumberType(object):
def __init__(self, value):
self.__value = value
def __repr__(self):
return "MyNumberType(%s)" % (self.__value,)

You can define a fallback encoder that pickles the objects it receives and returns them as Binary instances with a custom subtype. This custom subtype allows you to define a TypeDecoder class that identifies pickled artifacts upon retrieval and transparently decodes them back into Python objects:

import pickle
from bson.binary import Binary, USER_DEFINED_SUBTYPE
from bson.codec_options import TypeDecoder
def fallback_pickle_encoder(value):
return Binary(pickle.dumps(value), USER_DEFINED_SUBTYPE)
class PickledBinaryDecoder(TypeDecoder):
bson_type = Binary
def transform_bson(self, value):
if value.subtype == USER_DEFINED_SUBTYPE:
return pickle.loads(value)
return value

You can then add the PickledBinaryDecoder to the codec options for a collection and use it to encode and decode your custom types:

from bson.codec_options import CodecOptions,TypeRegistry
codec_options = CodecOptions(
type_registry=TypeRegistry(
[PickledBinaryDecoder()], fallback_encoder=fallback_pickle_encoder
)
)
collection = db.get_collection("test", codec_options=codec_options)
collection.insert_one(
{"_id": 1, "str": MyStringType("hello world"), "num": MyNumberType(2)}
)
my_doc = collection.find_one()
print(isinstance(my_doc["str"], MyStringType))
print(isinstance(my_doc["num"], MyNumberType))
True
True

PyMongo type codecs and fallback encoders have the following limitations:

  • You can't customize the encoding behavior of Python types that PyMongo already understands, like int and str. If you try to instantiate a type registry with one or more codecs that act upon a built-in type, PyMongo raises a TypeError. This limitation also applies to all subtypes of the standard types.

  • You can't chain type encoders. A custom type value, once transformed by a codec's transform_python() method, must result in a type that is either BSON-encodable by default, or can be transformed by the fallback encoder into something BSON-encodable. It cannot be transformed a second time by a different type codec.

  • The Database.command() method doesn't apply custom type decoders while decoding the command response document.

  • The gridfs class doesn't apply custom type encoding or decoding to any documents it receives or returns.

For more information about encoding and decoding custom types, see the following API documentation:

  • TypeCodec

  • TypeEncoder

  • TypeDecoder

  • TypeRegistry

  • CodecOptions

Back

Specialized Data Formats