Implementing Right to Erasure with CSFLE
Tom McCarthy, Pierre Petersson7 min read • Published Feb 22, 2023 • Updated Mar 08, 2023
Rate this article
The right to erasure, also known as the right to be forgotten, is a right granted to individuals under laws and regulations such as GDPR. This means that companies storing an individual's personal data must be able to delete it on request. Because this data can be spread across several systems, it can be technically challenging for these companies to identify and remove it from all places. Even if this is properly executed, there is also a risk that deleted data can be restored from backups in the future, potentially contributing to legal and financial risks.
This blog post addresses those challenges, demonstrating how you can make use of MongoDB's Client-Side Field Level Encryption to strengthen procedures for removing sensitive data.
Disclaimer: We provide no guarantees that the solution and techniques described in this article will fulfill regulatory requirements around the right to erasure. Each organization needs to make their own determination on appropriate or sufficient measures to comply with various regulatory requirements such as GDPR.
Crypto shredding is a data destruction technique that consists of destroying the encryption keys that allow the data to be decrypted, thus making the data undecipherable. The example below gives a more in-depth explanation.
Imagine you are storing data for multiple users. You start by giving each user their own unique data encryption key (DEK), and mapping it to that customer. This is represented in the below diagram, where "User A" and "User B" each have their own key in the key store. This DEK can then be used to encrypt and decrypt any data related to the user in question.
Let's assume that we want to remove all data for User B. If we remove User B's DEK, we can no longer decrypt any of the data that was encrypted with it; all we have left in our data store is "junk" cipher text. As the diagram below illustrates, User A's data is unaffected, but we can no longer read User B's data.
With MongoDB’s Client-Side Field Level Encryption (CSFLE), applications can encrypt sensitive fields in documents prior to transmitting data to the server. This means that even when data is being used by the database in memory, it is never in plain text. The database only sees the encrypted data but still enables you to query it.
MongoDB CSFLE utilizes envelope encryption, which is the practice of encrypting plaintext data with a data key, which itself is in turn encrypted by a top level envelope key (also known as a "master key").
Envelope keys are usually managed by a Key Management Service (KMS). MongoDB CSFLE supports multiple KMSs, such as AWS KMS, GCP KMS, Azure KeyVault, and Keystores supporting the KMIP standard (e.g., Hashicorp Keyvault).
CSFLE can be used in either automatic mode or explicit mode — or a combination of both. Automatic mode enables you to perform encrypted read and write operations based on a defined encryption schema, avoiding the need for application code to specify how to encrypt or decrypt fields. This encryption schema is a JSON document that defines what fields need to be encrypted. Explicit mode refers to using the MongoDB driver's encryption library to manually encrypt or decrypt fields in your application.
In this article, we are going to use the explicit encryption technique to showcase how we can use crypto shredding techniques with CSFLE to implement (or augment) procedures to "forget" sensitive data. We'll be using AWS KMS to demonstrate this.
With MongoDB as our database, we can use CSFLE to implement crypto shredding, so we can provide stronger guarantees around data privacy.
To demonstrate how you could implement this, we'll walk you through a demo application. The demo application is a python (Flask) web application with a front end, which exposes functionality for signup, login, and a data entry form. We have also added an "admin" page to showcase the crypto shredding related functionality. If you want to follow along, you can run the application yourself — you'll find the necessary code and instructions in GitHub.
When a user signs up, our application will generate a DEK for the user, then store the ID for the DEK along with other user details. Key generation is done via the
create_data_key
method on the ClientEncryption
class, which we initialized earlier as app.mongodb_encryption_client
. This encryption client is responsible for generating a DEK, which in this case will be encrypted by the envelope key. In our case, the encryption client is configured to use an envelope key from AWS KMS.1 # flaskapp/db_queries.py 2 3 4 def create_key(userId): 5 data_key_id = \ 6 app.mongodb_encryption_client.create_data_key(kms_provider, 7 master_key, key_alt_names=[userId]) 8 return data_key_id
We can then use this method when saving the user.
1 # flaskapp/user.py 2 3 def save(self): 4 dek_id = db_queries.create_key(self.username) 5 result = app.mongodb[db_name].user.insert_one( 6 { 7 "username": self.username, 8 "password_hash": self.password_hash, 9 "dek_id": dek_id, 10 "createdAt": datetime.now(), 11 } 12 ) 13 if result: 14 self.id = result.inserted_id 15 return True 16 else: 17 return False
Once signed up, the user can then log in, after which they can enter data via a form shown in the screenshot below. This data has a "name" and a "value", allowing the user to store arbitrary key-value pairs.
In the database, we'll store this data in a MongoDB collection called “data,” in documents structured like this:
1 { 2 "name": "shoe size", 3 "value": "10", 4 "username": "tom" 5 }
For the sake of this demonstration, we have chosen to encrypt the value and username fields from this document. Those fields will be encrypted using the DEK created on signup belonging to the logged in user.
1 # flaskapp/db_queries.py 2 3 # Fields to encrypt, and the algorithm to encrypt them with 4 ENCRYPTED_FIELDS = { 5 # Deterministic encryption for username, because we need to search on it 6 "username": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic, 7 # Random encryption for value, as we don't need to search on it 8 "value": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random, 9 }
The insert_data function then loops over the fields we want to encrypt and the algorithm we're using for each.
1 # flaskapp/db_queries.py 2 3 def insert_data(document): 4 document["username"] = current_user.username 5 # Loop over the field names (and associated algorithm) we want to encrypt 6 for field, algo in ENCRYPTED_FIELDS.items(): 7 # if the field exists in the document, encrypt it 8 if document.get(field): 9 document[field] = encrypt_field(document[field], algo) 10 # Insert document (now with encrypted fields) to the data collection 11 app.data_collection.insert_one(document)
If the specified fields exist in the document, this will call our encrypt_field function to perform the encryption using the specified algorithm.
1 # flaskapp/db_queries.py 2 3 # Encrypt a single field with the given algorithm 4 5 def encrypt_field(field, algorithm): 6 try: 7 field = app.mongodb_encryption_client.encrypt( 8 field, 9 algorithm, 10 key_alt_name=current_user.username, 11 ) 12 return field 13 except pymongo.errors.EncryptionError as ex: 14 # Catch this error in case the DEK doesn't exist. Log a warning and 15 # re-raise the exception 16 if "not all keys requested were satisfied" in ex._message: 17 app.logger.warn( 18 f"Encryption failed: could not find data encryption key for user: {current_user.username}" 19 ) 20 raise ex
Once data is added, it will be shown in the web app:
Now let's see what happens if we delete the DEK. To do this, we can head over to the admin page. This admin page should only be provided to individuals that have a need to manage keys, and we have some choices:
We're going to use the "Delete data encryption key" option, which will remove the DEK, but leave all data entered by the user intact. After that, the application will no longer be able to retrieve the data that was stored via the form. When trying to retrieve the data for the logged in user, an error will be thrown
Note: After we do perform the data key deletion, the web application may still be able to decrypt and show the data for a short period of time before its cache expires — this takes a maximum of 60 seconds.
But what is actually left in the database? To get a view of this, you can go back to the Admin page and choose "Fetch data for all users." In this view, we won't throw an exception if we can't decrypt the data. We'll just show exactly what we have stored in the database. Even though we haven't actually deleted the user's data, because the data encryption key no longer exists, all we can see now is cipher text for the encrypted fields "username" and "value".
And here is the code we're using to fetch the data in this view. As you can see, we use very similar logic to the encrypt method shown earlier. We perform a find operation without any filters to retrieve all the data from our data collection. We'll then loop over our ENCRYPTED_FIELDS dictionary to see which fields need to be decrypted.
1 # flaskapp/db_queries.py 2 3 def fetch_all_data_unencrypted(decrypt=False): 4 results = list(app.data_collection.find()) 5 6 if decrypt: 7 for field in ENCRYPTED_FIELDS.keys(): 8 for result in results: 9 if result.get(field): 10 result[field], result["encryption_succeeded"] = decrypt_field(result[field]) 11 return results
The decrypt_field function is called for each field to be decrypted, but in this case we'll catch the error if we cannot successfully decrypt it due to a missing DEK.
1 # flaskapp/db_queries.py 2 3 # Try to decrypt a field, returning a tuple of (value, status). This will be either (decrypted_value, True), or (raw_cipher_text, False) if we couldn't decrypt 4 def decrypt_field(field): 5 try: 6 # We don't need to pass the DEK or algorithm to decrypt a field 7 field = app.mongodb_encryption_client.decrypt(field) 8 return field, True 9 # Catch this error in case the DEK doesn't exist. 10 except pymongo.errors.EncryptionError as ex: 11 if "not all keys requested were satisfied" in ex._message: 12 app.logger.warn( 13 "Decryption failed: could not find data encryption key to decrypt the record." 14 ) 15 # If we can't decrypt due to missing DEK, return the "raw" value. 16 return field, False 17 raise ex
We can also use the
mongosh
shell to check directly in the database, just to prove that there's nothing there we can read.At this point, savvy readers may be asking the question, "But what if we restore the database from a backup?" If we want to prevent this, we can use two separate database clusters in our application — one for storing data and one for storing DEKs (the "key vault"). This theory is applied in the sample application, which requires you to specify two MongoDB connection strings — one for data and one for the key vault. If we use separate clusters, it decouples the restoration of backups for application data and the key vault; restoring a backup on the data cluster won't restore any DEKs which have been deleted from the key vault cluster.
In this blog post, we've demonstrated how MongoDB's Client-Side Field Level Encryption can be used to simplify the task of "forgetting" certain data. With a single "delete data key" operation, we can effectively forget data which may be stored across different databases, collections, backups, and logs. In a real production application, we may wish to delete all the user's data we can find, on top of removing their DEK. This "defense in depth" approach helps us to ensure that the data is really gone. By implementing crypto shredding, the impact is much smaller if a delete operation fails, or misses some data that should have been wiped.
You can find more details about MongoDB's Client-Side Field Level Encryption in our documentation. If you have questions, feel free to make a post on our community forums.