Okta Digital Breach Investigation

Chris Sandulow
March 22, 2022

Last Update: March 22, 2022 (6:00 pm ET)

We are aware of the reports of the potential compromise of Okta.

As a network partner, the MongoDB Information Security team immediately started an investigation to determine whether our products, services, or internal systems were impacted by the breach. At this time, there is no evidence that MongoDB or our products have been impacted.

Nevertheless, we have increased situational-awareness monitoring to better understand the rapidly changing environment, and we continue to examine our systems for any potential signs of malicious behavior.

We will share any updates here on the blog. Customers who have specific concerns can contact us at security@mongodb.com.

← Previous

Introducing MongoDB’s Prometheus Monitoring Integration

Wouldn’t it be great if you could connect your data stored in the world’s leading document database to the leading open source monitoring solution? Absolutely! And now you can. Prometheus has been a longstanding developer favored solution by providing monitoring and alerting functionality for cloud-native environments. It has key features like a multi-dimensional data model with time series support, a flexible query language to leverage their dimensionality called PromQL, and no reliance on distributed storage. MongoDB meets monitoring like never before Our integration allows you to view MongoDB hardware and monitoring metrics all within Prometheus. If you were a user of MongoDB and Prometheus before, this means you no longer have to worry about jumping back and forth between applications to view your data. Our official Prometheus integration provides complete feature parity with Atlas metrics in a secure and supported environment. With a few clicks in the UI, you can configure the integration and set up custom scraping intervals for your Atlas Admin API endpoints to ensure your view in Prometheus is consistently updated based on your preference. Best of all, this integration is free and available for use with MongoDB Atlas (clusters M10 and higher) and Cloud Manager. We truly believe in the freedom to run anywhere, and that includes viewing your data in your preferred monitoring solutions. How the Prometheus Integration works with MongoDB The MongoDB Prometheus integration converts the results of a series of MongoDB commands into Prometheus protocol, allowing Prometheus to scrape the metrics you can view through your MongoDB monitoring charts and more. Once Prometheus successfully collects your metrics, you can parse your metrics in the Prometheus UI or create custom dashboards in Grafana. Get started with the Prometheus Integration If you already have an Atlas account, get started by following the instructions below: Log into your Atlas account. Click the vertical three dot menu next to the project dropdown in the upper lefthand corner of the screen. Select “Integrations.” The Prometheus Monitoring Integration is listed here. Select “Configure” on the Prometheus tile, and follow the guided setup flow. If you don’t have an Atlas account, create an m10 or higher Atlas cluster and follow the instructions above. Note: If you were one of the customers who requested this integration, we thank you! We appreciate your feedback and suggestions, and look forward to implementing more in the future. Input is always welcome at feedback.mongodb.com .

March 16, 2022

Next →

ORiGAMi: A Machine Learning Architecture for the Document Model

The document model has proven to be the optimal paradigm for modern application schemas. At MongoDB, we've long understood that semi-structured data formats like JSON offer superior expressiveness compared to traditional tabular and relational representations. Their flexible schema accommodates dynamic and nested data structures, naturally representing complex relationships between data entities. However, the machine learning (ML) community has faced persistent challenges when working with semi-structured formats. Traditional ML algorithms, as implemented in popular libraries like scikit-learn and pandas , operate on the assumption of fixed-dimensional tabular data consisting of rows and columns. This fundamental mismatch forces data scientists to manually convert JSON documents into tabular form—a time-consuming process that requires significant domain expertise. Recent advances in natural language processing (NLP) demonstrate the power of Transformers in learning from unstructured data but their application to semi-structured data, has been under-studied. To bridge this gap, MongoDB's ML research group has developed a novel Transformer-based architecture designed for supervised learning on semi-structured data (e.g., JSON data in a document model database). We call this new architecture ORiGAMi (Object Representation through Generative, Autoregressive Modelling), and we're excited to make it available to the community at github.com/mongodb-labs/origami . It includes components that make training a Transformer model feasible on datasets entailing as few as 200 labeled samples. By combining this data efficiency with the flexibility of Transformers, ORiGAMi enables prediction directly from semi-structured documents, without the cumbersome flattening and manual feature extraction required for tabular data representation. You can read more about our model on arXiv . Technical innovation The key insight behind ORiGAMi lies in its tokenization strategy: documents are transformed into sequences of key-value pairs and special structural tokens that encode nested types like arrays and subdocuments: These token sequences serve as input to the Transformer model trained to predict the next token given a portion of the document, similar to how large language models (LLMs) are trained on text tokens. What’s more, our modifications to the standard Transformer architecture include guardrails to ensure that the model only generates valid, well-formed documents, and a novel position encoding strategy that respects the order invariance of key/value pairs in JSON. These modifications also allow for much smaller models compared to LLMs, which can thus be trained on consumer hardware in minutes to hours depending on dataset size and complexity, versus days to weeks for LLMs. By reformulating classification as a next-token prediction task, ORiGAMi can predict any field within a document, including complex types like arrays and nested subdocuments. This unified approach eliminates the need for separate models or preprocessing pipelines for different prediction tasks. Example use case Our initial focus has been supervised learning: training models from labeled data to make predictions on unseen documents. Let's explore a practical example of user segmentation. Consider a collection where each document represents a user profile, containing both simple fields and complex nested structures: { "_id": "user_7842", "email": "sarah.chen@example.com", "signup_date": "2024-01-15", "device_history": [ { "device": "mobile_ios", "first_seen": "2024-01-15", "last_seen": "2024-02-11" }, { "device": "desktop_chrome", "first_seen": "2024-01-16", "last_seen": "2024-02-10" } ], "subscription": { "plan": "pro", "billing_cycle": "annual", "features_used": ["analytics", "api_access", "team_sharing"], "usage_metrics": { "storage_gb": 45.2, "api_calls_per_day": 1250, "active_projects": 8 } }, "user_segment": "enterprise_power_user" // <-- target field } Suppose you want to automatically classify users into segments like "enterprise_power_user", "smb_growth", or "early_stage_startup" based on their behavior and characteristics. Some documents in your collection already have correct labels, perhaps assigned through manual analysis or customer interviews. Traditional ML approaches would require flattening this rich document structure, leading to very sparse tables and potentially losing important hierarchical relationships. With ORiGAMi, you can: Train directly on the raw documents with existing labels Preserve the full context of nested structures and arrays Make predictions for the "user_segment" field on new users immediately after signup Update predictions as user behavior evolves without rebuilding feature pipelines Getting started with ORiGAMi We're excited to be open-sourcing ORiGAMi ( github.com/mongodb-labs/origami ) and you can read more about our model on arXiv . We've also included a command-line interface that lets users make predictions without writing any code. Training a model is as simple as pointing ORiGAMi to your MongoDB collection: origami train <mongo-uri> -d app -c users Once trained, you can generate predictions and seamlessly integrate them back into your MongoDB workflow. For example, to predict user segments for new signups (from the analytics.signups collection ) and write the resulting predictions back to MongoDB to an analytics.predicted collection: origami predict <mongo-uri> -d analytics -c signups --target user_segment --json | mongoimport -d analytics -c predicted For those looking to dive deeper, we've also included several Jupyter notebooks in the repository that demonstrate advanced features and customization options. Model performance can be improved by adjusting the hyperparameters. We're just scratching the surface of what's possible with document-native machine learning, and have many more use cases in mind. We invite you to explore the repository, contribute to the project, and share how you use ORiGAMi to solve real-world problems. Head over to the ORiGAMi github repo , play around with it, and tell us about new ways of applying it and problems it’s well-suited to solving.

March 11, 2025