How to Set Up HashiCorp Vault KMIP Secrets Engine with MongoDB CSFLE or Queryable Encryption
Rate this tutorial
Encryption is proven and trusted and has been around for close to 60 years, but there are gaps. So when we think about moving data (TLS encryption) and storing data (storage encryption), most databases have that covered. But as soon as data is in use, processed by the database, it's in plain text and more vulnerable to insider access and active breaches. Most databases do not have this covered.
With MongoDB’s Client-Side Field Level Encryption (CSFLE) and Queryable Encryption, applications can encrypt sensitive plain text fields in documents prior to transmitting data to the server. This means that data processed by database (in use) will not be in plain text as it’s always encrypted and most importantly still can be queried. The encryption keys used are typically stored in a key management service.
Organizations with a multi-cloud strategy face the challenge of how to manage encryption keys across cloud environments in a standardized way, as the public cloud KMS services use proprietary APIs — e.g., AWS KMS, Azure Key Vault, or GCP KMS — to manage encryption keys. Organizations wanting to have a standardized way of managing the lifecycle of encryption keys can utilize KMIP, Key Management Interoperability Protocol.
As shown in the diagram above, KMIPs eliminate the sprawl of encryption key management services in multiple cloud providers by utilizing a KMIP-enabled key provider. MongoDB CSFLE and Queryable Encryption support KMIP as a key provider.
In this article, I will showcase how to use MongoDB Queryable Encryption and CSFLE with Hashicorp Key Vault KMIP Secrets Engine to have a standardized way of managing the lifecycle of encryption keys regardless of cloud provider.
Before I dive deeper into how to actually use MongoDB CSFLE and Queryable Encryption, I will explain encryption terminology and the common practice to encrypt plain text data.
Customer Master Key (CMK) is the encryption key used to protect (encrypt) the Data Encryption Keys, which is on the top level of the encryption hierarchy.
The Data Encryption Key (DEK) is used to encrypt the data that is plain text. Once plain text is encrypted by the DEK, it will be in cipher text.
Plain text data is unencrypted information that you wish to protect.
Cipher text is encrypted information unreadable by a human or computer without decryption.
Envelope encryption is the practice of encrypting plain text data with a data encryption key (DEK) and then encrypting the data key using the customer master key.
The prerequisites to enable querying in CSFLE or Queryable Encryption mode are:
- A running Key Management System which supports the KMIP standard — e.g., HashiCorp Key Vault. Application configured to use the KMIP endpoint.
- Data Encryption Keys (DEK) created and an encryption JSON schema that is used by a MongoDB driver to know which fields to encrypt.
- An authenticated MongoDB connection with CSFLE/Queryable Encryption enabled.
- You will need a supported server version and a compatible driver version. For this tutorial we are going to use MongoDB Atlas version 6.0. Refer to documentation to see what driver versions for CSFLE or Queryable Encryption is required.
Once the above are fulfilled, this is what happens when a query is executed.
Step 1: Upon receiving a query, the MongoDB driver checks to see if any encrypted fields are involved using the JSON encryption schema that is configured when connecting to the database.
Step 2: The MongoDB driver requests the Customer Master Key (CMK) key from the KMIP key provider. In our setup, it will be HashiCorp Key Vault.
Step 3: The MongoDB driver decrypts the data encryptions keys using the CMK. The DEK is used to encrypt/decrypt the plain text fields. What fields to encrypt/decrypt are defined in the JSON encryption schema. The encrypted data encryption keys are stored in a key vault collection in your MongoDB cluster.
Step 4: The driver submits the query to the MongoDB server with the encrypted fields rendered as ciphertext.
Step 5: MongoDB returns the encrypted results of the query to the MongoDB driver, still as ciphertext.
Step 6: MongoDB Driver decrypts the encrypted fields using DEK to plain text and returns it to the authenticated client.
Next is to actually set up and configure the prerequisites needed to enable querying MongoDB in CSFLE or Queryable Encryption mode.
So let's look at what's required to install, configure, and run to implement what's described in the section above.
- MongoDB Atlas cluster: MongoDB Atlas is a fully managed data platform for modern applications. Storing data the way it’s accessed as documents makes developers more productive. It provides a document-based database that is cost-efficient and resizable while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. It allows you to focus on your applications by providing the foundation of high performance, high availability, security, and compatibility they need. For this tutorial we are going to use MongoDB Atlas version 6.0. Refer to documentation to see what driver versions for CSFLE or Queryable Encryption is required.
- Hashicorp Vault Enterprise: Run and configure the Hashicorp Key Vault KMIP Secrets Engine, along with Scopes, Roles, and Certificates.
- Python application: This showcases how CSFLE and Queryable Encryption can be used with HashiCorp Key Vault. I will show you how to configure DEK, JSON Schema, and a MongoDB authenticated client to connect to a database and execute queries that can query on encrypted data stored in a collection in MongoDB Atlas.
First off, we need to have at least an Atlas account to provision Atlas and then somewhere to run our automation. You can get an Atlas account for free at mongodb.com. If you want to take this tutorial for a spin, take the time and create your Atlas account now.
You will also need to have Docker installed as we are using a docker container where we have prebaked an image containing all needed dependencies, such as HashiCorp Key Vault, MongoDB Driver, and crypto library.. For more information on how to install Docker, see Get Started with Docker. Also, install the latest version of MongoDB Compass, which we will use to actually see if the fields in collection have been encrypted.
Now we are almost ready to get going. You’ll need to clone this tutorial’s Github repository. You can clone the repo by using the below command:
1 git clone https://github.com/mongodb-developer/mongodb-kmip-fle-queryable
There are main four steps to get this tutorial running:
- Retrieval of trial license key for Hashicorp Key Vault
- Update database connection string
- Start docker container, embedded with Hashicorp Key Vault
- Run Python application, showcasing CSFLE and Queryable Encryption
Next is to request a trial license key for Hashicorp Enterprise Key Vault from the Hashicorp product page. Copy the generated license key that is generated.
Replace the content of license.txt with the generated license key in the step above. The file is located in the cloned github repository at location kmip-with-hashicorp-key-vault/vault/license.txt.
You will need to update the connection string so the Python application can connect to your MongoDB Atlas cluster. It’s best to update both configuration files as this tutorial will demonstrate both CSFLE and Queryable Encryption.
For CSFLE: Open file kmip-with-hashicorp-key-vault/configuration_fle.py line 3, and update connection_uri.
1 encrypted_namespace = "DEMO-KMIP-FLE.users" 2 key_vault_namespace = "DEMO-KMIP-FLE.datakeys" 3 connection_uri = "mongodb+srv://<USER>:<PASSWORD>@<CLUSTER-NAME>?retryWrites=true&w=majority" 4 # Configure the "kmip" provider. 5 kms_providers = { 6 "kmip": { 7 "endpoint": "localhost:5697" 8 } 9 } 10 kms_tls_options = { 11 "kmip": { 12 "tlsCAFile": "vault/certs/FLE/vv-ca.pem", 13 "tlsCertificateKeyFile": "vault/certs/FLE/vv-client.pem" 14 } 15 }
Replace , , with your Atlas cluster connection configuration, after you have updated with your Atlas cluster connection details. You should have something looking like this:
1 encrypted_namespace = "DEMO-KMIP-FLE.users" 2 key_vault_namespace = "DEMO-KMIP-FLE.datakeys" 3 connection_uri = "mongodb+srv://admin:mPassword@demo-cluster.tcrpd.mongodb.net/myFirstDatabase?retryWrites=true&w=majority" 4 # Configure the "kmip" provider. 5 kms_providers = { 6 "kmip": { 7 "endpoint": "localhost:5697" 8 } 9 } 10 kms_tls_options = { 11 "kmip": { 12 "tlsCAFile": "vault/certs/FLE/vv-ca.pem", 13 "tlsCertificateKeyFile": "vault/certs/FLE/vv-client.pem" 14 } 15 }
For Queryable Encryption: Open file kmip-with-hashicorp-key-vault/configuration_queryable.py in the cloned Github repository, update line 3, replace , , with your Atlas cluster connection configuration. So you should have something looking like this, after you have updated with your Atlas cluster connection details.
1 encrypted_namespace = "DEMO-KMIP-QUERYABLE.users" 2 key_vault_namespace = "DEMO-KMIP-QUERYABLE.datakeys" 3 connection_uri = "mongodb+srv://admin:mPassword@demo-cluster.tcrpd.mongodb.net/myFirstDatabase?retryWrites=true&w=majority" 4 5 # Configure the "kmip" provider. 6 kms_providers = { 7 "kmip": { 8 "endpoint": "localhost:5697" 9 } 10 } 11 kms_tls_options = { 12 "kmip": { 13 "tlsCAFile": "vault/certs/QUERYABLE/vv-ca.pem", 14 "tlsCertificateKeyFile": "vault/certs/QUERYABLE/vv-client.pem" 15 } 16 }
A prebaked docker image is prepared that has HashiCorp Vault installed and a Mongodb shared library. The MongoDB shared library is the translation layer that takes an unencrypted query and translates it into an encrypted format that the server understands. It is what makes it so that you don't need to rewrite all of your queries with explicit encryption calls. You don't need to build the docker image, as it’s already published at docker hub. Start container in root of this repo. Container will be started and the current folder will be mounted to kmip in the running container. Port 8200 is mapped so you will be able to access the Hashicorp Key Vault Console running in the docker container. The ${PWD} is used to set the current path you are running the command from. If running this tutorial on Windows shell, replace ${PWD} with the full path to the root of the cloned Github repository.
1 docker run -p 8200:8200 -it -v ${PWD}:/kmip piepet/mongodb-kmip-vault:latest
Running the below commands within the started docker container will start Hashicorp Vault Server and configure the Hashicorp KMIP Secrets engine. Scopes, Roles, and Certificates will be generated, vv-client.pem, vv-ca.pem, vv-key.pem, separate for CSFLE or Queryable Encryption.
1 cd kmip 2 ./start_and_configure_vault.sh -a
Wait until you see the below output in your command console:
You can now access the Hashicorp Key Vault console, by going to url http://localhost:8200/. You should see this in your browser:
Let’s sign in to the Hashicorp console to see what has been configured. Use the “Root token” outputted in your shell console. Once you are logged in you should see this:
The script that you just executed —
./start_and_configure_vault.sh -a
— uses the Hashicorp Vault cli to create all configurations needed, such as Scopes, Roles, and Certificates. You can explore what's created by clicking demo/kmip.If you want to utilize the Hashicorp Key Vault server from outside the docker container, you will need to add port 5697.
A sample Python application will be used to showcase the capabilities of CSFLE where the encryption schema is defined on the database. Let's start by looking at the main method of the Python application in the file located at
kmip-with-hashicorp-key-vault/vault_encrypt_with_csfle_kmip.py
.1 def main(): 2 reset() 3 #1,2 Configure your KMIP Provider and Certificates 4 kmip_provider_config = configure_kmip_provider() 5 #3 Configure Encryption Data Keys 6 data_keys_config = configure_data_keys(kmip_provider_config) 7 #4 Create collection with Validation Schema for CSFLE defined, will be stored in 8 create_collection_with_schema_validation(data_keys_config) 9 #5 Configure Encrypted Client 10 secure_client=configure_csfle_session() 11 #6 Run Query 12 create_user(secure_client) 13 if __name__ == "__main__": 14 main()
Row 118: Drops database, just to simplify rerunning this tutorial. In a production setup, this would be removed.
Row 120: Configures the MongoDB driver to use the Hashicorp Vault KMIP secrets engine, as the key provider. This means that CMK will be managed by the Hashicorp Vault KMIP secrets engine.
Row 122: Creates Data Encryption Keys to be used to encrypt/decrypt fields in collection. The encrypted data encryption keys will be stored in the database DEMO-KMIP-FLE in collection datakeys.
Row 124: Creates collection and attaches Encryption JSON schema that defines which fields need to be encrypted.
Row 126: Creates a MongoClient that enables CSFLE and uses Hashicorp Key Vault KMIP Secrets Engine as the key provider.
Row 128: Inserts a user into database DEMO-KMIP-FLE and collection users, using the MongoClient that is configured at row 126. It then does a lookup on the SSN field to validate that MongoDB driver can query on encrypted data.
Let's start the Python application by executing the below commands in the running docker container:
1 cd /kmip/kmip-with-hashicorp-key-vault/ 2 python3.8 vault_encrypt_with_csfle_kmip.py
Start MongoDB Compass, connect to your database DEMO-KMIP-FLE, and review the collection users. Fields that should be encrypted are ssn, contact.mobile, and contact.email. You should now be able to see in Compass that fields that are encrypted are masked by ****** shown as value — see the picture below:
A sample Python application will be used to showcase the capabilities of Queryable Encryption, currently in Public Preview, with schema defined on the server. Let's start by looking at the main method of the Python application in the file located at
kmip-with-hashicorp-key-vault/vault_encrypt_with_queryable_kmip.py
.1 def main(): 2 reset() 3 #1,2 Configure your KMIP Provider and Certificates 4 kmip_provider_config = configure_kmip_provider() 5 #3 Configure Encryption Data Keys 6 data_keys_config = configure_data_keys(kmip_provider_config) 7 #4 Create Schema for Queryable Encryption, will be stored in database 8 encrypted_fields_map = create_schema(data_keys_config) 9 #5 Configure Encrypted Client 10 secure_client = configure_queryable_session(encrypted_fields_map) 11 #6 Run Query 12 create_user(secure_client) 13 if __name__ == "__main__": 14 main()
Row 121: Drops database, just to simplify rerunning application. In a production setup, this would be removed.
Row 123: Configures the MongoDB driver to use the Hashicorp Vault KMIP secrets engine, as the key provider. This means that CMK will be managed by the Hashicorp Vault KMIP secrets engine.
Row 125: Creates Data Encryption Keys to be used to encrypt/decrypt fields in collection. The encrypted data encryption keys will be stored in the database DEMO-KMIP-QUERYABLE in collection datakeys.
Row 127: Creates Encryption Schema that defines which fields need to be encrypted. It’s important to note the encryption schema has a different format compared to CSFLE Encryption schema.
Row 129: Creates a MongoClient that enables Queryable Encryption and uses Hashicorp Key Vault KMIP Secrets Engine as the key provider.
Row 131: Inserts a user into database DEMO-KMIP-QUERYABLE and collection users, using the MongoClient that is configured at row 129. It then does a lookup on the SSN field to validate that MongoDB driver can query on encrypted data.
Let's start the Python application to test Queryable Encryption.
1 cd /kmip/kmip-with-hashicorp-key-vault/ 2 python3.8 vault_encrypt_with_queryable_kmip.py
Start MongoDB Compass, connect to your database DEMO-KMIP-QUERYABLE, and review the collection users. Fields that should be encrypted are ssn, contact.mobile, and contact.email. You should now be able to see in Compass that fields that are encrypted are masked by ****** shown as value, as seen in the picture below.
If you want to rerun the tutorial, run the following in the root of this git repository outside the docker container.
1 ./cleanup.sh
In this blog, you have learned how to configure and set up CSFLE and Queryble Encryption with Hashicorp Key Vault KMIP secrets engine. By utilizing KMIP, you will have a standardized way of managing the lifecycle of encryption keys, regardless of Public Cloud KMS services.. Learn more about CSFLE and Queryable Encryption.