Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Implementing Right to Erasure with CSFLE

Tom McCarthy, Pierre Petersson7 min read • Published Feb 22, 2023 • Updated Mar 08, 2023
FlaskMongoDBPython
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
The right to erasure, also known as the right to be forgotten, is a right granted to individuals under laws and regulations such as GDPR. This means that companies storing an individual's personal data must be able to delete it on request. Because this data can be spread across several systems, it can be technically challenging for these companies to identify and remove it from all places. Even if this is properly executed, there is also a risk that deleted data can be restored from backups in the future, potentially contributing to legal and financial risks.
This blog post addresses those challenges, demonstrating how you can make use of MongoDB's Client-Side Field Level Encryption to strengthen procedures for removing sensitive data.
Disclaimer: We provide no guarantees that the solution and techniques described in this article will fulfill regulatory requirements around the right to erasure. Each organization needs to make their own determination on appropriate or sufficient measures to comply with various regulatory requirements such as GDPR.

What is crypto shredding?

Crypto shredding is a data destruction technique that consists of destroying the encryption keys that allow the data to be decrypted, thus making the data undecipherable. The example below gives a more in-depth explanation.
Imagine you are storing data for multiple users. You start by giving each user their own unique data encryption key (DEK), and mapping it to that customer. This is represented in the below diagram, where "User A" and "User B" each have their own key in the key store. This DEK can then be used to encrypt and decrypt any data related to the user in question.
Diagram showing different data encryption keys for User A and User B
Let's assume that we want to remove all data for User B. If we remove User B's DEK, we can no longer decrypt any of the data that was encrypted with it; all we have left in our data store is "junk" cipher text. As the diagram below illustrates, User A's data is unaffected, but we can no longer read User B's data.
Diagram showing what happens when user B's data encryption key is deleted from the key store

What is CSFLE?

With MongoDB’s Client-Side Field Level Encryption (CSFLE), applications can encrypt sensitive fields in documents prior to transmitting data to the server. This means that even when data is being used by the database in memory, it is never in plain text. The database only sees the encrypted data but still enables you to query it.
MongoDB CSFLE utilizes envelope encryption, which is the practice of encrypting plaintext data with a data key, which itself is in turn encrypted by a top level envelope key (also known as a "master key").
Diagram showing an envelope key being used to encrypt a data key, which in turn encrypts data
Envelope keys are usually managed by a Key Management Service (KMS). MongoDB CSFLE supports multiple KMSs, such as AWS KMS, GCP KMS, Azure KeyVault, and Keystores supporting the KMIP standard (e.g., Hashicorp Keyvault).
CSFLE can be used in either automatic mode or explicit mode — or a combination of both. Automatic mode enables you to perform encrypted read and write operations based on a defined encryption schema, avoiding the need for application code to specify how to encrypt or decrypt fields. This encryption schema is a JSON document that defines what fields need to be encrypted. Explicit mode refers to using the MongoDB driver's encryption library to manually encrypt or decrypt fields in your application.
In this article, we are going to use the explicit encryption technique to showcase how we can use crypto shredding techniques with CSFLE to implement (or augment) procedures to "forget" sensitive data. We'll be using AWS KMS to demonstrate this.

Bringing it all together

With MongoDB as our database, we can use CSFLE to implement crypto shredding, so we can provide stronger guarantees around data privacy.
To demonstrate how you could implement this, we'll walk you through a demo application. The demo application is a python (Flask) web application with a front end, which exposes functionality for signup, login, and a data entry form. We have also added an "admin" page to showcase the crypto shredding related functionality. If you want to follow along, you can run the application yourself — you'll find the necessary code and instructions in GitHub.
demo application home page
When a user signs up, our application will generate a DEK for the user, then store the ID for the DEK along with other user details. Key generation is done via the create_data_key method on the ClientEncryption class, which we initialized earlier as app.mongodb_encryption_client. This encryption client is responsible for generating a DEK, which in this case will be encrypted by the envelope key. In our case, the encryption client is configured to use an envelope key from AWS KMS.
1# flaskapp/db_queries.py
2
3@aws_credential_handler
4def create_key(userId):
5 data_key_id = \
6 app.mongodb_encryption_client.create_data_key(kms_provider,
7 master_key, key_alt_names=[userId])
8 return data_key_id
We can then use this method when saving the user.
1# flaskapp/user.py
2
3def save(self):
4 dek_id = db_queries.create_key(self.username)
5 result = app.mongodb[db_name].user.insert_one(
6 {
7 "username": self.username,
8 "password_hash": self.password_hash,
9 "dek_id": dek_id,
10 "createdAt": datetime.now(),
11 }
12 )
13 if result:
14 self.id = result.inserted_id
15 return True
16 else:
17 return False
Once signed up, the user can then log in, after which they can enter data via a form shown in the screenshot below. This data has a "name" and a "value", allowing the user to store arbitrary key-value pairs.
demo application showing a form to add data
In the database, we'll store this data in a MongoDB collection called “data,” in documents structured like this:
1{
2 "name": "shoe size",
3 "value": "10",
4 "username": "tom"
5}
For the sake of this demonstration, we have chosen to encrypt the value and username fields from this document. Those fields will be encrypted using the DEK created on signup belonging to the logged in user.
1# flaskapp/db_queries.py
2
3# Fields to encrypt, and the algorithm to encrypt them with
4ENCRYPTED_FIELDS = {
5 # Deterministic encryption for username, because we need to search on it
6 "username": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
7 # Random encryption for value, as we don't need to search on it
8 "value": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
9}
The insert_data function then loops over the fields we want to encrypt and the algorithm we're using for each.
1# flaskapp/db_queries.py
2
3def insert_data(document):
4 document["username"] = current_user.username
5 # Loop over the field names (and associated algorithm) we want to encrypt
6 for field, algo in ENCRYPTED_FIELDS.items():
7 # if the field exists in the document, encrypt it
8 if document.get(field):
9 document[field] = encrypt_field(document[field], algo)
10 # Insert document (now with encrypted fields) to the data collection
11 app.data_collection.insert_one(document)
If the specified fields exist in the document, this will call our encrypt_field function to perform the encryption using the specified algorithm.
1# flaskapp/db_queries.py
2
3# Encrypt a single field with the given algorithm
4@aws_credential_handler
5def encrypt_field(field, algorithm):
6 try:
7 field = app.mongodb_encryption_client.encrypt(
8 field,
9 algorithm,
10 key_alt_name=current_user.username,
11 )
12 return field
13 except pymongo.errors.EncryptionError as ex:
14 # Catch this error in case the DEK doesn't exist. Log a warning and
15 # re-raise the exception
16 if "not all keys requested were satisfied" in ex._message:
17 app.logger.warn(
18 f"Encryption failed: could not find data encryption key for user: {current_user.username}"
19 )
20 raise ex
Once data is added, it will be shown in the web app:
demo application showing the data added in the previous step
Now let's see what happens if we delete the DEK. To do this, we can head over to the admin page. This admin page should only be provided to individuals that have a need to manage keys, and we have some choices:
demo application admin page, showing the "delete data encryption key" button
We're going to use the "Delete data encryption key" option, which will remove the DEK, but leave all data entered by the user intact. After that, the application will no longer be able to retrieve the data that was stored via the form. When trying to retrieve the data for the logged in user, an error will be thrown
demo application showing an error message when trying to retrieve data
Note: After we do perform the data key deletion, the web application may still be able to decrypt and show the data for a short period of time before its cache expires — this takes a maximum of 60 seconds.
But what is actually left in the database? To get a view of this, you can go back to the Admin page and choose "Fetch data for all users." In this view, we won't throw an exception if we can't decrypt the data. We'll just show exactly what we have stored in the database. Even though we haven't actually deleted the user's data, because the data encryption key no longer exists, all we can see now is cipher text for the encrypted fields "username" and "value".
demo application showing raw cipher text instead of decrypted data
And here is the code we're using to fetch the data in this view. As you can see, we use very similar logic to the encrypt method shown earlier. We perform a find operation without any filters to retrieve all the data from our data collection. We'll then loop over our ENCRYPTED_FIELDS dictionary to see which fields need to be decrypted.
1# flaskapp/db_queries.py
2
3def fetch_all_data_unencrypted(decrypt=False):
4 results = list(app.data_collection.find())
5
6 if decrypt:
7 for field in ENCRYPTED_FIELDS.keys():
8 for result in results:
9 if result.get(field):
10 result[field], result["encryption_succeeded"] = decrypt_field(result[field])
11 return results
The decrypt_field function is called for each field to be decrypted, but in this case we'll catch the error if we cannot successfully decrypt it due to a missing DEK.
1# flaskapp/db_queries.py
2
3# Try to decrypt a field, returning a tuple of (value, status). This will be either (decrypted_value, True), or (raw_cipher_text, False) if we couldn't decrypt
4def decrypt_field(field):
5 try:
6 # We don't need to pass the DEK or algorithm to decrypt a field
7 field = app.mongodb_encryption_client.decrypt(field)
8 return field, True
9 # Catch this error in case the DEK doesn't exist.
10 except pymongo.errors.EncryptionError as ex:
11 if "not all keys requested were satisfied" in ex._message:
12 app.logger.warn(
13 "Decryption failed: could not find data encryption key to decrypt the record."
14 )
15 # If we can't decrypt due to missing DEK, return the "raw" value.
16 return field, False
17 raise ex
We can also use the mongosh shell to check directly in the database, just to prove that there's nothing there we can read.
mongosh
At this point, savvy readers may be asking the question, "But what if we restore the database from a backup?" If we want to prevent this, we can use two separate database clusters in our application — one for storing data and one for storing DEKs (the "key vault"). This theory is applied in the sample application, which requires you to specify two MongoDB connection strings — one for data and one for the key vault. If we use separate clusters, it decouples the restoration of backups for application data and the key vault; restoring a backup on the data cluster won't restore any DEKs which have been deleted from the key vault cluster.

Conclusion

In this blog post, we've demonstrated how MongoDB's Client-Side Field Level Encryption can be used to simplify the task of "forgetting" certain data. With a single "delete data key" operation, we can effectively forget data which may be stored across different databases, collections, backups, and logs. In a real production application, we may wish to delete all the user's data we can find, on top of removing their DEK. This "defense in depth" approach helps us to ensure that the data is really gone. By implementing crypto shredding, the impact is much smaller if a delete operation fails, or misses some data that should have been wiped.
You can find more details about MongoDB's Client-Side Field Level Encryption in our documentation. If you have questions, feel free to make a post on our community forums.

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Update Array Elements in a Document with MQL Positional Operators


Feb 03, 2023 | 6 min read
Tutorial

Optimizing $lookup Performance Using the Power of Indexing


Aug 30, 2024 | 7 min read
Tutorial

Preparing Time Series Data for Analysis Tools With $densify and $fill


Sep 17, 2024 | 8 min read
Tutorial

Integrating MongoDB With Amazon Managed Streaming for Apache Kafka (MSK)


Sep 17, 2024 | 7 min read
Table of Contents
  • What is crypto shredding?