MongoDB and AWS: Simplifying OSDU Metadata Management

Dhruv Vashisth and Robert Oberhofer

#Atlas#DevOps

In this decade of the 2020s, the energy sector is experiencing two major changes at the same time: The transition from fossil to renewables, and the digital transformation that changes the way businesses operate through better applications and tools that help streamline and automate processes.

To support both of these challenges, the Open Group OSDU Forum has created a new data platform standard for the energy industry that seeks to reduce data silos and enable transformational workflows via an open, standards-based API set and supporting ecosystem. OSDU (Open Subsurface Data Universe) is an industry-defining initiative that provides a unified approach to store and retrieve data in a standardized way in order to allow reductions in infrastructure cost, simplify the integration of separate business areas, and adopt new energy verticals within the same architectural principles.

Amazon Web Services (AWS) — as an early supporter of OSDU — provides a premier, cloud-first offering available across more than 87 availability zones and 27 regions. MongoDB — an OSDU member since 2019 — and AWS are collaborating to leverage MongoDB as part of the AWS OSDU platform for added flexibility and to provide a robust multi-region OSDU offering to major customers.

Why MongoDB for OSDU?

OSDU provides a unique challenge, as its architecture is set to support a varied data set originating from the oil and gas industry, while also being extensible enough to support the expanding requirements of new energy and renewables. It must be able to support single-use on a laptop for beginning practitioners, yet scale to the needs of experts with varying deployment scenarios — from on-premises, in-field, and cloud — and from single tenant on one region to multi-region and multi-tenant applications. Furthermore, OSDU architectural principles separate raw object data from the metadata that describes it, which puts an additional burden on the flexibility needed to manage OSDU metadata, while supporting all the above requirements.

Enter MongoDB

Since 2008, MongoDB has championed the use of the document model as the data store that supports a flexible JSON-type structure, which can be considered a superset of different existing data types — from tabular, key-value, and text to geo-spatial, graph, and time series. Thus, MongoDB has the flexibility not only to support just the main metadata services in OSDU but also to adapt to the needs of domain-specific services as OSDU evolves. The flexibility of MongoDB allows users to model and query the data in a variety of ways within the same architecture without the need to proliferate disparate databases for each specific data type, which incurs overhead both in terms of deployment, cost and scale, and the ability to query. The schema flexibility inherent in this document model allows developers to adapt and make changes quickly, without the operational burden that comes with schema changes with traditional tabular databases.

MongoDB can also scale from the smallest environment to massive, multi-region deployments, with cross-regional data replication support that is available today across more than 90 regions with MongoDB Atlas. With the addition of MongoDB’s cluster-to-cluster sync, MongoDB can easily support hybrid deployments bridging on-premises or edge to the cloud, a requirement that is increasingly important for energy supermajors or for regions where data sovereignty is paramount.

Example: LegalTag

An example of the benefit of MongoDB’s document model is OSDU’s LegalTag Compliance Service, which governs the legal status of data in the OSDU data ecosystem. It is a collection of JSON properties that governs how the data can be consumed and ingested. With MongoDB, the properties are directly stored, indexed, and made available to be queried — even via full-text search for more advanced use cases. The schema flexibility simplifies integrating additional derived data from ingested data sources, which is utilized for the further enrichment of the LegalTag metadata. Here the JSON document can accommodate more nodes to integrate this data without the need for new tables and data structures that need to be created and managed.

AWS OSDU with MongoDB

MongoDB and AWS collaborated to provide a MongoDB-based metadata implementation (Figure 1), which is available for all main OSDU services: Partition, Entitlements, Legal, Schema, Storage.

The AWS default ODSU Partition service leverages MongoDB due to its simple replication capabilities (auto-deployable via CloudFormation, Terraform, and Kubernetes), which simplify identifying the correct connection information at runtime to the correct OSDU partition in a multi-region and multi-cluster deployment.

The OSDU Entitlements service manages authorization and permissions for access to OSDU services and its data-using groups. The most recent OSDU reference implementation for Entitlements leverages a graph model to manage the relationship between groups, members, and owners. Thus, AWS again chose MongoDB with its inherent graph capabilities through the document model to simplify the implementation without the need to integrate a further dedicated database technology into the architecture.

Figure 1:   MongoDB metadata service options with AWS OSDU.

Other potential benefits for OSDU

MongoDB also offers workload isolation, which provides the ability to dedicate instances only for reporting workloads against the operational dataset. This provides the ability to create real-time observability of the system based on the activity on metadata. Triggers and aggregation pipelines allow the creation of an alternate view of activity in real-time, which can easily be visualized via MongoDB Charts (part of Atlas) without the need for a dedicated visualization system.

Flexibility and consistency

A major use case for both the energy industry and the direction of OSDU is the ability to capture and preprocess data closest to where it originated. For remote locations where direct connections to the cloud are prohibitive, this approach is often the only option — think Arctic or off-shore locations. Additionally, certain countries have data sovereignty laws that require an alternative deployment option outside of the public cloud.

A MongoDB-based OSDU implementation can provide a distinct advantage, as MongoDB as a data platform itself supports deployment in the field (e.g., off-shore), on-premises, in private cloud (e.g., Kubernetes, Terraform), public cloud (e.g., AWS) and as a SaaS implementation (e.g., Atlas).

Adoption of MongoDB for OSDU provides consistency across different deployment/cloud scenarios, thereby reducing the overhead for managing and operating a disparate set of technologies where multiple scenarios are required.

Conclusion

OSDU has been created to change the way data is collected and shared across the oil and gas and energy industry. Its intent is to accelerate digital transformation within the industry. The range of use cases and deployment scenarios requires a solution that provides flexibility in the supported datasets, flexibility for the developer to innovate without additional schema and operational burden, as well as flexibility to be deployable in various environments.

Through the collaboration of AWS and MongoDB, there is an additional metadata storage option available for OSDU that provides a modern technology stack with the performance and scalability for the most demanding scenario in the energy industry.