Atlas Stream Processing

7 results

Atlas Stream Processing Now Supports Azure and Azure Private Link

Today, we’re excited to announce that Atlas Stream Processing now supports Microsoft Azure! This update opens new possibilities for developers leveraging Azure’s cloud ecosystem, offering a way to: Seamlessly integrate MongoDB Atlas and Apache Kafka Effortlessly handle complex and rapidly changing data structures Use the familiarity of the MongoDB Query API for processing streaming data Benefit from a fully managed service that eliminates operational overhead Azure support in four regions At launch, we’re supporting four Azure regions spanning both the U.S. and Europe: Azure Region Location US East Virginia, US US East 2 Virginia, US US West California, US West Europe Netherlands We’ll continue adding more regions across cloud providers in the future. Let us know which regions you need next in UserVoice . Atlas Stream Processing simplifies integrating MongoDB with Apache Kafka to build event-driven applications. New to Atlas Stream Processing? Watch our 3-minute explainer . How it works Working with Atlas Stream Processing on Azure will feel just like it does already today when using AWS. During the Stream Processing Instance (SPI) tier selection in the Atlas UI or CLI, simply select Azure as your provider and then choose your desired region. Figure 1: Stream Processing instance setup via Atlas UI $ atlas streams instances create AzureSPI --provider AZURE --region westus --tier SP10 Figure 2: Stream Processing instance setup via the Atlas CLI Secure networking for Azure Event Hubs via Azure Private Link In addition to adding support for Azure in multiple regions, we’re introducing Azure Private Link support for developers using Azure Event Hubs . Event Hubs is Azure’s native, Kafka-compatible data streaming service. As a reminder, Atlas Stream Processing supports any service that uses the Kafka Wire Protocol . That includes Azure Event Hubs, AWS Managed Service for Kafka (MSK), Redpanda, and Confluent Cloud. As we have written before , security is critical for data services, and it’s especially important with stream processing systems where connecting to technologies like Apache Kafka external to a database like MongoDB Atlas, is required. For this reason, we’re engineering Atlas Stream Processing to leverage the advanced networking capabilities available through the major cloud providers (AWS, Azure, and GCP). Networking To better understand the value of support for private link, let’s summarize the three key ways that developers typically connect between data services: Public networking Private networking through VPC peering Private networking through private link Public networking connects services using public IP addresses. It’s the least secure of all approaches. This makes it the easiest to set up, but it's a less secure approach than either VPC peering or private link. Private networking through VPC peering connects services across two virtual private clouds (VPCs). This improves security compared with public networking by keeping traffic off the public internet and is commonly used for testing and development purposes. Private networking through private link is even more secure by enforcing connections to specific endpoints. While VPC peering lets resources from one VPC connect to all of the resources in the other VPC, private link ensures that each specific resource can only connect to defined services with specific associated endpoints. This connection method is important for use cases relying on sensitive data. Figure 3: Private Link allows for connecting to specific endpoints Ready to get started? With support for Azure Private Link, Atlas Stream Processing now makes it simple to implement the most secure method for networking across MongoDB and Kafka on Azure Event Hubs. Login today to get started, or check out our documentation to create your first private link connection.

December 10, 2024

Atlas Stream Processing: A Cost-Effective Way to Integrate Kafka and MongoDB

Developers around the world use Apache Kafka and MongoDB together to build responsive, modern applications. There are two primary interfaces for integrating Kafka and MongoDB. In this post, we’ll introduce these interfaces and highlight how Atlas Stream Processing offers an easy developer experience, cost savings, and performance advantages when using Apache Kafka in your applications. First, we will provide some background. The Kafka Connector For many years, MongoDB has offered the MongoDB Connector for Kafka (Kafka Connector). The Kafka Connector enables the movement of data between Apache Kafka and MongoDB, and thousands of development teams use it. While it supports simple message transformation, developers largely handle data processing with separate downstream tools. Atlas Stream Processing More recently , we announced Atlas Stream Processing—a native stream processing solution in MongoDB Atlas. Atlas Stream Processing is built on the document model and extends the MongoDB Query API to give developers a powerful, familiar way to connect to streams of data and perform continuous processing. The simplest stream processors act similarly to the primary Kafka Connector use case, helping developers move data from one place to another, whether from Kafka to MongoDB or vice versa. Check out an example: // Connect to MongoDB Atlas database using $source. s = { $source: { connectionName: 'myAtlasCluster', db: myDB', coll: ‘myCollection’ } } // Write your data to a Kafka topic using $emit. e = { $emit: { connectionName: 'myKafkaConnection', topic: myTopic } } // Create your processor and start it! sp.createStreamProcessor("mongoDBtoKafka", [s,e]) sp.mongoDBToKafka.start() Beyond making data movement easy, Atlas Stream Processing enables advanced stream processing use cases not possible in the Kafka Connector. One common use case is enriching your event data by using $lookup as a stage in your stream processor. In the example above, a developer can perform this enrichment by simply adding a lookup stage in the pipeline between source and sink. While the Kafka Connector can perform some single message transformations, Atlas Stream Processing makes for both an easier overall experience and gives teams the ability to perform much more complex processing. Choosing the right solution for your needs It’s important to note that Atlas Stream Processing was built to simplify complex, continuous processing and streaming analytics rather than as a replacement for the Kafka Connector. However, even for the more basic data movement use cases referenced above, it provides a new alternative to the Kafka Connector. The decision will depend on data movement and processing needs. Three common considerations we see teams making to help with this choice are ease of use, performance, and cost. Ease of use The Kafka Connector runs on Kafka Connect. If your team already heavily uses Kafka Connect across many systems beyond MongoDB, this may be a good reason to keep it in place. However, many teams find configuring, monitoring, and maintaining connectors costly and cumbersome. In contrast, Atlas Stream Processing is a fully managed service integrated into MongoDB Atlas. It prioritizes ease of use by leveraging the MongoDB Query API to process your event data continuously. Atlas Stream Processing balances simplicity (no managing servers, utilizing other cloud platforms, or learning new tools) and processing power to reduce development time, decrease infrastructure and maintenance costs, and build applications quicker. Performance High performance is increasingly a priority with all data infrastructure, but it’s often a must-have for use cases that rely on streams of event data (commonly from Apache Kafka) to deliver an application feature. Many of our early customers have found Atlas Stream Processing more performant than similar data movement in their Kafka Connector configurations. By connecting directly to your data in Kafka and MongoDB and acting on it as needed, Atlas Stream Processing eliminates the need for a tool in-between. Cost Finally, managing costs is a critical consideration for all development teams. We’ve priced Atlas Stream Processing competitively when compared to typical Kafka Connector configurations. Most hosted Kafka providers charge per task. That means each additional source and sink will generate a separate data transfer and storage cost that linearly scales as you expand. Atlas Stream Processing charges per Stream Processing Instance (SPI) worker and each worker supports up to four stream processors. This means potential cost savings when running similar configurations to the Kafka Connector. See more details in the documentation . Atlas Stream Processing launched just a few months ago. Developers are already using it for a wide range of use cases, like managing real-time inventories, serving contextually relevant recommendations, and optimizing yields in industrial manufacturing facilities. We can’t wait to see what you build and hear about your experience! Ready to get started? Log in to Atlas today. Already a Kafka Connector user? Dig into even more details and get started using our tutorial .

September 9, 2024

Atlas Stream Processing Adds AWS Regions, VPC Peering, & More!

Since announcing the general availability of Atlas Stream Processing —MongoDB’s native stream processing solution—it’s been exciting to see development teams across technology, retail, and manufacturing begin to run production stream processing workloads critical to their businesses. Today, we're announcing four key updates to Atlas Stream Processing. Support for AWS Regions across the US, Europe, and APAC First, we're thrilled to announce that Atlas Stream Processing now supports eight new AWS regions . This expansion enhances deployment flexibility across the US, Europe, and APAC. Adding these new AWS regions broadens our reach and opens up a world of possibilities for users. We're committed to further expanding our reach by adding more regions and cloud providers in the future. Newly supported regions launched today include: Region AWS Region Name Oregon, USA us-west-2 Sao Paulo, Brazil   sa-east-1 Ireland eu-west-1 London, England eu-west-2 Frankfurt, Germany   eu-central-1 Mumbai, India ap-south-1 Singapore ap-southeast-1 Sydney, Australia ap-southeast-2 Adding these new AWS regions for Atlas Stream Processing is the latest example of the close partnership between MongoDB and AWS. For example, over the past year, MongoDB announced integrations with Amazon Bedrock and Amazon Q Developer; MongoDB was named an AWS Generative AI Competency Partner ; we launched the MongoDB AI Applications Program —which helps customers rapidly build AI applications—with AWS and other tech leaders; and MongoDB was named the AWS Technology Partner of the Year at the AWS Partner Summit Taiwan. Support for VPC peering Next, Atlas Stream Processing now supports VPC peering for self-hosted Apache Kafka on AWS and Amazon Managed Streaming for Apache Kafka (AWS MSK) . VPC peering is a secure method for connecting between virtual private clouds . As stream processing solutions like Atlas Stream Processing inherently connect to external data sources outside of MongoDB, the ability to make these connections as if your resources are on the same private network is a critical security requirement for many organizations. Users can select from any VPC peer configured within an Atlas project when setting up Kafka connections. Because peering is at the stream processing connection level, developers can configure Atlas Stream Processing to consume events from one Kafka cluster and produce them to another in a different VPC. Note that this feature has an additional cost. You can learn more in our documentation . Expanded support for Apache Kafka Third, we’re expanding capabilities for Apache Kafka in this release. Kafka is one of two key data sources Atlas Stream Processing supports today. One of Kafka’s strengths is its flexibility, allowing developers to customize configurations to suit various use cases, including those that rely on continuous stream processing. That flexibility can also create complexity, but Atlas Stream Processing focuses on making Kafka’s critical features easily accessible using the MongoDB Query API. By adding support for Kafka keys, developers can now read and write Kafka keys on their events, which enables filtering, partitioning, and aggregating based on key values. This ability provides greater control over routing processed data and is powerful for many stream processing use cases. Expanded Atlas Admin API support Lastly, we have added support for creating and deleting stream processors, as well as fetching operational stats of stream processors using the Atlas Admin API. Developers relying on the Admin API as a primary interface for interacting with Atlas will find this a welcome addition for managing their stream processors. Learn more in the documentation . With these new capabilities—additional AWS region support, VPC peering, the ability to use Kafka keys, and improved stream processing support for the Atlas Admin API—we've made it easier than ever for developers to integrate stream processing into their applications. We're excited to see the innovative ways you'll use these features. Ready to unlock the full potential of Atlas Stream Processing? Log in to Atlas today and start exploring the new features. We're eager to hear your feedback, so don't hesitate to share it with us on UserVoice . Your insights help us continue to improve and innovate.

August 7, 2024

Atlas Stream Processing is Now Generally Available!

We're thrilled to announce that Atlas Stream Processing —the MongoDB-native way to process streaming data—is now generally available, empowering developers to quickly build responsive, event-driven applications! Our team spent the last two years defining a vision and building a product that leans into MongoDB’s strengths to overcome the hard challenges in stream processing. After a decade of building stream processing products outside of MongoDB, we are using everything that makes MongoDB unique and differentiated—the Query API and powerful aggregation framework, as well as the document model and its schema flexibility—to create an awesome developer experience. It’s a new approach to stream processing, and based on the feedback of so many of you in our community, it’s the best way for most developers using MongoDB to do it. Let’s get into what’s new. This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국어 , 简体中文 . What's new in general availability? Production Readiness Ready to support your production workloads, ensuring reliable and scalable stream processing for your mission-critical applications. Time Series Collection Support Emit processor results into Time Series Collections . Pre-process data continuously while saving it for historical access later in a collection type available in MongoDB Atlas built to efficiently store and query time series data. Development and Production Tiers Besides the SP30 cluster tier available during the public preview, we’re introducing an SP10 tier to provide flexibility and a cost-effective option for exploratory use cases and low-traffic stream processing workloads. Improved Kafka Support Added support for Kafka headers allows applications to provide additional metadata alongside event data. They are helpful for various stream processing use cases (e.g., routing messages, conditional processing, and more). Least Privilege Access Atlas Database Users can grant access to Stream Processing Instances and enable access to only those who need it. Read our tutorial for more information. Stream Processor Alerting Gain insight and visibility into the health of your stream processors by creating alerts for when a failure occurs. Supported methods for alerting include email, SMS, monitoring platforms like Datadog, and more . Why Atlas Stream Processing? Atlas Stream Processing brings the power and flexibility of MongoDB's document model and Query API to the challenging stream processing space. With Atlas Stream Processing, developers can: Effortlessly handle complex and rapidly changing data structures Use the familiar MongoDB Query API for processing streaming data Seamlessly integrate with MongoDB Atlas Benefit from a fully managed service that eliminates operational overhead Customer highlights Read what developers are saying about Atlas Stream Processing: At Acoustic, our key focus is to empower brands with behavioral insights that enable them to create engaging, personalized customer experiences. To do so, our Acoustic Connect platform must be able to efficiently process and manage millions of marketing, behavioral, and customer signals as they occur. With Atlas Stream Processing, our engineers can leverage the skills they already have from working with data in Atlas to process new data continuously, ensuring our customers have access to real-time customer insights. John Riewerts, EVP, Engineering at Acoustic Atlas Stream Processing enables us to process, validate, and transform data before sending it to our messaging architecture in AWS powering event-driven updates throughout our platform. The reliability and performance of Atlas Stream Processing has increased our productivity, improved developer experience, and reduced infrastructure cost. Cody Perry, Software Engineer, Meltwater What's ahead for Atlas Stream Processing? We’re rapidly introducing new features and functionality to ensure MongoDB delivers a world-class stream processing experience for all development teams. Over the next few months, you can expect to see: Advanced Networking Support Support for VPC Peering to Kafka Clusters for teams requiring additional networking capabilities Expanded Cloud Region Support Support for all cloud regions available in Atlas Data Federation Expanded Cloud Provider Support Support for Microsoft Azure Expanded Data Source and Sink Support We have plans to expand beyond Kafka and Atlas databases in the coming months. Let us know which sources and sinks you need, and we will factor that into our planning Richer Metrics & Observability Support for expanded visibility into your stream processors to help simplify monitoring and troubleshooting Expanded Deployment Flexibility Support for deploying stream processors with Terraform. This integration will help to enable a seamless CI/CD pipeline, enhancing operational efficiency with infrastructure as code. Look out for a dedicated blog in the near future on how to get started with Atlas Stream Processing and Terraform. So whether you're looking to process high-velocity sensor data, continuously analyze customer data to deliver personalized experiences, or perform predictive maintenance to increase yields and reduce costs, Atlas Stream Processing has you covered. Join the hundreds of development teams already building with Atlas Stream Processing. Stay tuned to hear more from us soon, and good luck building! Login today or check out our introductory tutorial to get started.

May 2, 2024

Atlas Stream Processing is Now in Public Preview

Update May 2, 2024: Atlas Stream Processing is now generally available. Read our blog to learn more . This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국인 , 简体中文 . Today, we’re excited to announce that Atlas Stream Processing is now in public preview. Any developer on Atlas interested in giving it a try has full access. Learn more in our docs or get started today. Listen to the MongoDB Podcast to learn about the Atlas Stream Processing public preview from Head of Streaming Products, Kenny Gorman. Developers love the flexibility and ease of use of the document model, alongside the Query API, which allows them to work with data as code in MongoDB Atlas. With Atlas Stream Processing, we are bringing these same foundational principles to stream processing. A report covering the topic published by S&P Global Market Intelligence 451 Research had this to say, “A unified approach to leveraging data for application development — the direction of travel for MongoDB — is particularly valuable in the context of stream processing where operational and development complexity has proven a significant barrier to adoption." First announced at .local NYC 2023, Atlas Stream Processing is redefining the experience of aggregating and enriching streams of high velocity, rapidly changing event data, and unifying how to work with data in motion and at rest. How are developers using the product so far? And what have we learned? During the private preview, we saw thousands of development teams request access and we have gathered useful feedback from hundreds of engaged teams. One of those engaged teams is the marketing technology leader, Acoustic : "At Acoustic, our key focus is to empower brands with behavioral insights that enable them to create engaging, personalized customer experiences. To do so, our Acoustic Connect platform must be able to efficiently process and manage millions of marketing, behavioral, and customer signals as they occur. With Atlas Stream Processing, our engineers can leverage the skills they already have from working with data in Atlas to process new data continuously, ensuring our customers have access to real-time customer insights." John Riewerts, EVP, Engineering at Acoustic Other interesting use cases include: A leading global airline using complex aggregations to rapidly process maintenance and operations data, ensuring on-time flights for their thousands of daily customers, A large manufacturer of energy equipment using Atlas Stream Processing to enable continuous monitoring of high-volume pump data to avoid outages and optimize their yields, and An innovative enterprise SaaS provider leveraging the rich processing capabilities in Atlas Stream Processing to deliver timely and contextual in-product alerts to drive improved product engagement. These are just a few of the many use-case examples that we’re seeing across industries. Beyond the use cases we’ve already seen, developers are giving us tons of insight into what they’d like to see us add to in the future. In addition to enabling continuous processing of data in Atlas databases through change streams, it’s exciting to see developers using Atlas Stream Processing with their Kafka data hosted by valued partners like Confluent , Amazon MSK , Azure Event Hubs , and Redpanda . Our aim with developer data platform capabilities in Atlas has always been to make for a better experience across the key technologies relied on by developers. What’s new in the public preview? That brings us to what’s new. As we scale to more teams, we’re expanding functionality to include the most requested feedback gathered in our private preview. From the many pieces of feedback received, three common themes emerged: Refining the developer experience Expanding advanced features and functionality Improving operations and security Refining the developer experience In private preview, we established the core of the developer experience that is essential to making Atlas Stream Processing a natural solution for development teams. And in public preview, we’re doubling down on this by making two additional enhancements: VS Code integration The MongoDB VS Code plugin has added support for connecting to Stream Processing instances. For developers already leveraging the plugin, teams can create and manage processors in a familiar development environment. This means less time switching between tools and more time building your applications! Improved dead letter queue (DLQ) capabilities DLQ support is a key element for powerful stream processing and in public preview, we’re expanding DLQ capabilities. DLQ messages will now display themselves when executing pipelines with sp.process() and when running .sample() on running processors, allowing for a more streamlined development experience that does not require setting up a target collection to act as a DLQ. Expanding advanced features and functionality Atlas Stream Processing already supported many of the key aggregation operators developers are familiar with in the Query API used with data at rest. We've now added powerful windowing capabilities and the ability to easily merge and emit data to an Atlas database or to a Kafka topic. Public preview will add even more functionality demanded by the most advanced teams relying on stream processing to deliver customer experiences: $lookup Developers can now enrich documents being processed in a stream processor with data from remote Atlas clusters, performing joins against fields from the document and the target collection. Change streams pre- and post-imaging Many developers are using Atlas Stream Processing to continuously process data in Atlas databases as a source through change streams. We have enhanced the change stream $source in public preview with support for pre-and post-images . This enables common use cases where developers need to calculate deltas between fields in documents as well as use cases requiring access to the full contents of a deleted document. Conditional routing with dynamic expressions in merge and emit stages Conditional routing lets developers use the value of fields in documents being processed in Atlas Stream Processing to dynamically send specific messages to different Atlas collections or Kafka topics. The $merge and $emit stages also now support the use of dynamic expressions. This makes it possible to use the Query API for use cases requiring the ability to fork messages to different collections or topics as needed. Idle stream timeouts Streams without advancing watermarks due to a lack of inbound data can now be configured to close after a period of time emitting the results of the windows. This can be critical for streaming sources that have inconsistent flows of data. Improving operations and security Finally, we have invested heavily over the past few months in improving other operational and security aspects of Atlas Stream Processing. A few of the highlights include: Checkpointing Atlas Stream Processing now performs checkpoints for saving a state while processing. Stream processors are continuously running processes, so whether due to a data issue or infrastructure failure, they require an intelligent recovery mechanism. Checkpoints make it easy to resume your stream processors from wherever data stopped being collected and processed. Terraform provider support Support for the creation of connections and stream processing instances (SPIs) is now available with Terraform. This allows for infrastructure to be authored as code for repeatable deployments. Security roles Atlas Stream Processing has added a project-level role, giving users just enough permission to perform their stream processing tasks. Stream processors can run under the context of a specific role, supporting a least privilege configuration. Auditing Atlas Stream Processing can now audit authentication attempts and actions within your Stream Processing Instance giving you insight into security-related events. Kafka consumer group support Stream processors in now use Kafka consumer groups for offset tracking. This allows users to easily change the position of the processor in the stream for operations and easily monitor for potential processor lag. A final note on what’s new is that in public preview, we will begin charging for Atlas Stream Processing, using preview pricing (subject to change). You can learn more about pricing in our documentation . Build your first stream processor today Public preview is a huge step forward for us as we expand the developer data platform and enable more teams with a stream processing solution that simplifies the operational complexity of building reactive, responsive, event-driven applications, while also offering an improved developer experience. We can’t wait to see what you build! Login today or get started with the tutorial , view our resources , or follow the Learning Byte on MongoDB University.

February 13, 2024

The Challenges and Opportunities of Processing Streaming Data

Let’s consider a fictitious bank that has a credit card offering for its customers. Transactional data might land in their database from various sources such as a REST API call from a web application or from a serverless function call made by a cash machine. Regardless of how the data was written to the database, the database performed its job and made the data available for querying by the end-user or application. The mechanics are database-specific but the end goal of all databases is the same. Once data is in a database the bank can query and obtain business value from this data. In the beginning, their architecture worked well, but over time customer usage grew and the bank found it difficult to manage the volume of transactions. The company decides to do what many customers in this scenario do and adopts an event-streaming platform like Apache Kafka to queue these event data. Kafka provides a highly scalable event streaming platform capable of managing large data volumes without putting debilitating pressure on traditional databases. With this new design, the bank could now scale supporting more customers and product offerings. Life was great until some customers started complaining about unrecognized transactions occurring on their cards. Customers were refusing to pay for these and the bank was starting to spend lots of resources figuring out how to manage these fraudulent charges. After all, by the time the data gets written into the database, and the data is batch loaded into the systems that can process the data, the user's credit card was already charged perhaps a few times over. However, hope is not lost. The bank realized that if they could query the transactional event data as it's flowing into the database they might be able to compare it with historical spending data from the user, as well as geolocation information, to make a real-time determination if the transaction was suspicious and warranted further confirmation by the customer. This ability to continuously query the stream of data is what stream processing is all about. From a developer's perspective, building applications that work with streaming data is challenging. They need to consider the following: Different serialization formats: The data that arrives in the stream may contain different serialization formats such as JSON, AVRO, Protobuf or even binary. Different schemas: Data originating from a variety of sources may contain slightly different schemas. Fields like CustomerID could be customerId from one source or CustID in another and a third could not even use the field. Late arriving data: The data itself could arrive late due to network latency issues or being completely out of order. Operational complexity: Developers need to be concerned with reacting to application state changes like failed connections to data sources and how to efficiently scale the application to meet the demands of the business. Security: In larger enterprises, the developer usually doesn’t have access to production data. This makes troubleshooting and building queries from this data difficult. Stream processing can help address these challenges and enable real-time use cases, such as fraud detection, hyper-personalization, and predictive maintenance, that are otherwise difficult or extremely costly to overcome. While many stream processing solutions exist, the flexibility of the document model and the power of the aggregation framework are naturally well suited to help developers with the challenges found with complex event data. Discover MongoDB Atlas Stream Processing Read the MongoDB Atlas Stream Processing announcement and check out Atlas Stream Processing tutorials on the MongoDB Developer Center . Request private preview access to Atlas Stream processing Request access today to participate in the private preview. New to MongoDB? Get started for free today by signing up for MongoDB Atlas .

August 30, 2023

Introducing Atlas Stream Processing - Simplifying the Path to Reactive, Responsive, Event-Driven Apps

Update May 2, 2024: Atlas Stream processing is now generally available. Read our blog to learn more . This post is also available in: Deutsch , Français , Español , Português , 中文 Atlas Stream Processing is now in public preview. Learn more about what’s new! Today, we’re excited to announce the private preview of Atlas Stream Processing ! The world is increasingly fast-paced and your applications need to keep up. Responsive, event-driven applications bring digital experiences to life for your customers and accelerate time to insight and action for the business. Think: Notifying your users as soon as their delivery status changes Blocking fraudulent transactions during payment processing Analyzing sensor telemetry as it is generated to detect and remediate potential equipment failures before costly outages. In each of these examples, data loses its value as the seconds tick by. It needs to be queried and actioned continuously and with low latency. To do this, developers are increasingly turning to event-driven applications fueled by streaming data so that they can instantly react and respond to the constantly changing world around them. Atlas Stream Processing will help developers make the shift to event-driven apps faster. Over the years, developers have adopted the MongoDB database because they love the flexibility and ease of use of the document model, along with the MongoDB Query API which allows them to work with data as code. These foundational principles dramatically remove friction from developing software and applications. Now, we are bringing those same principles to streaming data. Atlas Stream Processing is redefining the developer experience for working with complex streams of high velocity, rapidly changing data, and unifying how developers work with data in motion and at rest. While existing products and technologies have offered many innovations to streaming and stream processing, we think MongoDB is naturally well suited to help developers with some key remaining challenges. These challenges include the difficulty of working with variable, high volume, and high-velocity data; the contextual overhead of learning new tools, languages, and APIs; and the additional operational maintenance and fragmentation that can be introduced through point technologies into complex application stacks. Introducing Atlas Stream Processing Atlas Stream Processing enables processing high-velocity streams of complex data with a few unique advantages for the developer experience: It’s built on the document model, allowing for flexibility when dealing with the nested and complex data structures common in event streams. This alleviates the need for pre-processing steps while allowing developers to work naturally and easily with data that has complex structures. Just as the database allows. It unifies the experience of working across all data, offering a single platform – across API, query language, and tools – to process rich, complex streaming data alongside the critical application data in your database. And it’s fully managed in MongoDB Atlas , building on an already robust set of integrated services. With just a few API calls and lines of code, you can stand up a stream processor, database, and API serving layer across any of the major cloud providers. Watch the MongoDB .local NYC Keynote to see Atlas Stream Processing announced by our Chief Product Officer, Sahir Azam. He covers the emergence of streaming data and how it powers a variety of use cases, key streaming challenges, and how Atlas Stream Processing can help you build modern, event-driven applications. Head of Streaming Products, Kenny Gorman, then goes through a live demo of Atlas Stream Processing in action. How does Atlas Stream Processing work? Atlas Stream Processing connects to your critical data, whether that lives in MongoDB (through change streams ) or in an event streaming platform like Apache Kafka. Developers can easily and seamlessly connect to Confluent Cloud, Amazon MSK, Redpanda, Azure Event Hubs, or self-managed Kafka using the Kafka wire protocol. And by integrating with the native Kafka driver, Atlas Stream Processing offers low-latency native performance at its foundation. In addition to our long-standing strategic partnership with Confluent, we are also excited to announce partnerships with AWS, Microsoft, Redpanda, and Google, at launch. Atlas Stream Processing then provides 3 key capabilities required to turn your firehose of streaming data into differentiated customer experiences. Let’s go through these one by one. Continuous processing First, developers can now use MongoDB’s aggregation framework to continuously process rich and complex streams of data from event streaming platforms such as Apache Kafka. This unlocks powerful new ways to continuously query, analyze, and react to streaming data without any of the delays inherent in batch processing. With the aggregation framework, you can filter and group data, aggregating high-velocity event streams into actionable insights over stateful time windows, powering richer, real-time application experiences. Continuous validation Next, Atlas Stream Processing offers developers robust and native mechanisms to handle incorrect data issues that can otherwise cause havoc in applications. Potential issues include passing inaccurate results to the app, data loss, and application downtime. Atlas Stream Processing solves these problems to ensure streaming data can be reliably processed and shared between event-driven applications. Atlas Stream Processing: Provides Continuous Schema Validation to check that events are properly formed before processing – for example rejecting events with missing fields or containing invalid value ranges Detects message corruption, and Detects late-arriving data that has missed a processing window. Atlas Stream Processing pipelines can be configured with an integrated Dead Letter Queue (DLQ) into which events failing validation are routed. This avoids developers having to build and maintain their own custom implementations. Issues can be quickly debugged while the risk of missing or corrupt data bringing down the entire application is minimized. Continuous merge Your processed data can then be continuously materialized into views maintained in Atlas database collections. We can think of this as a push query. Applications can retrieve results (via pull queries) from the view using either the MongoDB Query API or Atlas SQL interface. Continuously merging updates to collections is a really efficient way of maintaining fresh analytical views of data supporting automated and human decision-making and action. In addition to materialized views, developers also have the flexibility to publish processed events back into streaming systems like Apache Kafka. Creating a Stream Processor Let’s show you how easy it is to build a stream processor in MongoDB Atlas. With Atlas Stream Processing, you can use the same aggregation pipeline syntax for a stream processor that you’re familiar with from the database. Below we’re showcasing a simple stream processing instance from start to finish. It takes just a few lines of code. First, we’ll write an aggregation pipeline that defines a source for your data, performs validation ensuring data is not coming from the localhost/127.0.0.1 IP address, creates a tumbling window to collect grouped message data every minute, and then merges that newly processed data into a MongoDB collection in Atlas. Then, we’ll create our Stream Processor called “netattacks” specifying our newly defined pipeline p as well as dlq as arguments. This will perform our desired processing, and by using a Dead Letter Queue (DLQ), will store any invalid data safely for inspection, debugging, or re-processing later. Lastly, we can start it. That’s all it takes to build a stream processor in MongoDB Atlas. Request private preview We’re excited to get this product into your hands and see what you build with it. Learn more about Atlas Stream Processing and request private preview to participate in the private preview once it opens to developers. New to MongoDB? Get started for free today by signing up for MongoDB Atlas . Head to the MongoDB.local hub to see where we'll be showing up next. Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.

June 22, 2023