Atlas Stream Processing: A Cost-Effective Way to Integrate Kafka and MongoDB
Developers around the world use Apache Kafka and MongoDB together to build responsive, modern applications. There are two primary interfaces for integrating Kafka and MongoDB.
In this post, we’ll introduce these interfaces and highlight how Atlas Stream Processing offers an easy developer experience, cost savings, and performance advantages when using Apache Kafka in your applications. First, we will provide some background.
The Kafka Connector
For many years, MongoDB has offered the MongoDB Connector for Kafka (Kafka Connector). The Kafka Connector enables the movement of data between Apache Kafka and MongoDB, and thousands of development teams use it. While it supports simple message transformation, developers largely handle data processing with separate downstream tools.
Atlas Stream Processing
More recently, we announced Atlas Stream Processing—a native stream processing solution in MongoDB Atlas. Atlas Stream Processing is built on the document model and extends the MongoDB Query API to give developers a powerful, familiar way to connect to streams of data and perform continuous processing. The simplest stream processors act similarly to the primary Kafka Connector use case, helping developers move data from one place to another, whether from Kafka to MongoDB or vice versa. Check out an example:
// Connect to MongoDB Atlas database using $source.
s = { $source: { connectionName: 'myAtlasCluster', db: myDB', coll: ‘myCollection’ } } 

// Write your data to a Kafka topic using $emit.
e = { $emit: { connectionName: 'myKafkaConnection', topic: myTopic } }

// Create your processor and start it!
sp.createStreamProcessor("mongoDBtoKafka", [s,e])
sp.mongoDBToKafka.start()

Beyond making data movement easy, Atlas Stream Processing enables advanced stream processing use cases not possible in the Kafka Connector. One common use case is enriching your event data by using $lookup as a stage in your stream processor. In the example above, a developer can perform this enrichment by simply adding a lookup stage in the pipeline between source and sink. While the Kafka Connector can perform some single message transformations, Atlas Stream Processing makes for both an easier overall experience and gives teams the ability to perform much more complex processing.
Choosing the right solution for your needs
It’s important to note that Atlas Stream Processing was built to simplify complex, continuous processing and streaming analytics rather than as a replacement for the Kafka Connector. However, even for the more basic data movement use cases referenced above, it provides a new alternative to the Kafka Connector. The decision will depend on data movement and processing needs. Three common considerations we see teams making to help with this choice are ease of use, performance, and cost.
Ease of use
The Kafka Connector runs on Kafka Connect. If your team already heavily uses Kafka Connect across many systems beyond MongoDB, this may be a good reason to keep it in place. However, many teams find configuring, monitoring, and maintaining connectors costly and cumbersome.
In contrast, Atlas Stream Processing is a fully managed service integrated into MongoDB Atlas. It prioritizes ease of use by leveraging the MongoDB Query API to process your event data continuously. Atlas Stream Processing balances simplicity (no managing servers, utilizing other cloud platforms, or learning new tools) and processing power to reduce development time, decrease infrastructure and maintenance costs, and build applications quicker.
Performance
High performance is increasingly a priority with all data infrastructure, but it’s often a must-have for use cases that rely on streams of event data (commonly from Apache Kafka) to deliver an application feature. Many of our early customers have found Atlas Stream Processing more performant than similar data movement in their Kafka Connector configurations. By connecting directly to your data in Kafka and MongoDB and acting on it as needed, Atlas Stream Processing eliminates the need for a tool in-between.
Cost
Finally, managing costs is a critical consideration for all development teams. We’ve priced Atlas Stream Processing competitively when compared to typical Kafka Connector configurations.
Most hosted Kafka providers charge per task. That means each additional source and sink will generate a separate data transfer and storage cost that linearly scales as you expand. Atlas Stream Processing charges per Stream Processing Instance (SPI) worker and each worker supports up to four stream processors. This means potential cost savings when running similar configurations to the Kafka Connector. See more details in the documentation.
Atlas Stream Processing launched just a few months ago. Developers are already using it for a wide range of use cases, like managing real-time inventories, serving contextually relevant recommendations, and optimizing yields in industrial manufacturing facilities.
We can’t wait to see what you build and hear about your experience!