Leveraging BigQuery JSON for Optimized MongoDB Dataflow Pipelines

Venkatesh Shanbhag and Zi Wang
December 17, 2024 | Updated: March 5, 2025
#Partners

We're delighted to introduce a major enhancement to our Google Cloud Dataflow templates for MongoDB Atlas. By enabling direct support for JSON data types, users can now seamlessly integrate their MongoDB Atlas data into BigQuery, eliminating the need for complex data transformations.

This streamlined approach not only saves users time and resources, but it also empowers customers to unlock the full potential of their data through advanced data analytics and machine learning.

Screenshot of the dashboard for Dataflow Templates and the option for picking the JSON format. — Figure 1: JSON feature for user options on Dataflow Templates

Limitations without JSON support

Traditionally, Dataflow pipelines designed to handle MongoDB Atlas data often necessitate the transformation of data into JSON strings or flattening complex structures to a single level of nesting before loading into BigQuery. Although this approach is viable, it can result in several drawbacks:

Increased latency: The multiple data conversions required can lead to increased latency and can significantly slow down the overall pipeline execution time.
Higher operational costs: The extra data transformations and storage requirements associated with this approach can lead to increased operational costs.
Reduced query performance: Flattening complex document structures in JSON String format can impact query performance and make it difficult to analyze nested data.

So, what’s new?

BigQuery's Native JSON format addresses these challenges by enabling users to directly load nested JSON data from MongoDB Atlas into BigQuery without any intermediate conversions.

This approach offers numerous benefits:

Reduced operating costs: By eliminating the need for additional data transformations, users can significantly reduce operational expenses, including those associated with infrastructure, storage, and compute resources.
Enhanced query performance: BigQuery's optimized storage and query engine is designed to efficiently process data in Native JSON format, resulting in significantly faster query execution times and improved overall query performance.
Improved data flexibility: users can easily query and analyze complex data structures, including nested and hierarchical data, without the need for time-consuming and error-prone flattening or normalization processes.

A significant advantage of this pipeline lies in its ability to directly leverage BigQuery's powerful JSON functions on the MongoDB data loaded into BigQuery. This eliminates the need for a complex and time-consuming data transformation process. The JSON data within BigQuery can be queried and analyzed using standard BQML queries.

Whether you prefer a streamlined cloud-based approach or a hands-on, customizable solution, the Dataflow pipeline can be deployed either through the Google Cloud console or by running the code from the github repository.

Enabling data-driven decision-making

To summarize, Google’s Dataflow template provides a flexible solution for transferring data from MongoDB to BigQuery. It can process entire collections or capture incremental changes using MongoDB's Change Stream functionality. The pipeline's output format can be customized to suit your specific needs. Whether you prefer a raw JSON representation or a flattened schema with individual fields, you can easily configure it through the userOption parameter. Additionally, data transformation can be performed during template execution using User-Defined Functions (UDFs).

By adopting BigQuery Native JSON format in your Dataflow pipelines, you can significantly enhance the efficiency, performance, and cost-effectiveness of your data processing workflows. This powerful combination empowers you to extract valuable insights from your data and make data-driven decisions.

Follow the Google Documentation to learn how to set up the Dataflow templates for MongoDB Atlas and BigQuery.

Get started with MongoDB Atlas on Google Marketplace.

Learn more about MongoDB Atlas on Google Cloud on our product page.

← Previous

Commerce at Scale: Zepto Reduces Latency by 40% With MongoDB

Zepto is one of the fastest-growing Indian startups and a pioneer in introducing quick commerce to India. Quick commerce, sometimes referred to as “Q-commerce” is a new, faster form of e-commerce promising ultra-quick deliveries, typically in less than one hour. Founded in July 2021, Zepto has revolutionized the Indian grocery delivery industry, offering users a choice of over 15,000 products with a promised 10-minute delivery. Since its launch, the company has rapidly expanded its operations, recording 20% monthly growth and achieving annualized sales of $1.5 billion by July 2024. Zepto’s order processing and delivery system is instrumental in meeting its promise to customers. Zepto’s system routes new orders to a “dark store,” where bleeding-edge assignment systems help pack orders in under 75 seconds. A proprietary navigation system ensures riders can then deliver these orders promptly. As Zepto expanded, its monolithic infrastructure, based on a relational SQL database, could not achieve the scalability and operational efficiency the company needed. Zepto changed the game by turning to MongoDB Atlas . Mayank Agarwal, Senior Architect at Zepto, shared the company’s journey with MongoDB during a presentation at MongoDB.local Bengaluru in September 2024 . “We had a big monolith. All the components were being powered by PostgreSQL and a few Redis clusters,” said Agarwal. “As our business was scaling, we were facing a lot of performance issues, as well as restrictions in terms of the velocity at which we wanted to operate.” Zepto’s legacy architecture posed four key issues: Performance bottlenecks: As Zepto grew, the need for complex database queries increased. These queries required multiple joins, which put a significant strain on the system, resulting in high CPU usage and an inability to provide customers and delivery partners with accurate data. Latency: Zepto needed its API response times to be fast. However, as the system grew, background processing tasks slowed down. This led to delays and caused the system to serve stale data to customers. A need for real-time analytics: Teams on the ground, such as packers and riders, required real-time insights on stock availability and performance metrics. Building an extract, transform, and load (ETL) pipeline for this was both time-consuming and resource-intensive. Increased data scaling requirements: Zepto’s data was growing exponentially. Managing it efficiently became increasingly difficult, especially when real-time archival and retrieval were required. MongoDB Atlas meets Zepto’s goals “We wanted to break our monolith into microservices and move to a NoSQL database . But we wanted to evaluate multiple databases,” said Agarwal. Zepto was looking for a document database that would let its team query data even when the documents were structured in a nested fashion. The team also needed queryability on array-based attributes or columns. MongoDB fulfilled both use cases. “Very optimally, we were able to do some [proofs of concept]. The queries were very performant, given the required indexes we had created, and that gave us confidence,” said Agarwal. “The biggest motivation factor was when we saw that MongoDB provides in-memory caching , which could address our huge Redis cluster that we couldn’t scale further.” Beyond scalability, MongoDB Atlas also provided high reliability and several built-in capabilities. That helped Zepto manage its infrastructure day to day, and create greater efficiencies for both its end users and its technical team. Speaking alongside Agarwal at MongoDB.local Bengaluru, Kshitij Singh, Technical Lead for Zepto, explained: “When we discovered MongoDB Atlas, we saw that there were a lot of built-in features like the MongoDB chat support , which gave us very qualitative insights whenever we faced any issues. That was an awesome experience for us.” Data archival , sharding support , and real-time analytic capabilities were also key in helping the Zepto team improve operational efficiencies. With MongoDB, Zepto was able to deploy new features more quickly. Data storage at the document level meant less management overhead and faster time to market for new capabilities. Furthermore, MongoDB’s archival feature made it easier for Zepto to manage large datasets. The feature also simplified the setup of secondary databases for ETL pipelines, reducing the heavy lifting for developers. “You go on the MongoDB Atlas platform and can configure archival in just one click,” said Singh. Zepto reduces latency, handles six times more traffic, and more The results of migrating to MongoDB Atlas were immediate and significant: Zepto saw a 40% reduction in latency for some of its most critical APIs, which directly improved the customer experience. Postmigration, Zepto’s infrastructure could handle six times more traffic than before, without any degradation in performance. This scalability enabled the company to continue its rapid growth without bottlenecks. Page load times improved by 14% , leading to higher conversion rates and increased sales. MongoDB’s support for analytical nodes helped Zepto segregate customer-facing workloads from internal queries. This ensured that customer performance was never compromised by internal reporting or analytics. “MongoDB is helping us grow our business exponentially,” said Agarwal at the end of his presentation. Visit our product page to learn more about MongoDB Atlas.

December 17, 2024

Next →

Driving Retail Loyalty with MongoDB and Cognigy

Retail is one of the fastest moving industries, often the very first to leverage cutting-edge AI to create next-gen experiences for their customers. One of the latest areas we’re seeing retailers invest in is agentic AI: they are creating conversational chatbot “agents” that are pulling real-time information from their systems, using Natural Language processing to create conversational responses to customer queries, and then taking action- completing tasks and solving problems. In this race to stay ahead of their competition, retailers today are struggling to quickly bring to market these agents and don’t always have the AI skills in-house. Many are looking to the broad ecosystem of off-the-shelf solutions to leverage the best of what’s already out there—reducing time to market for their AI agents and leaving the AI models and integrations to the experts in the field. Some of the most successful retail conversational AI agents we’ve seen are built on Cognigy , a global leader in customer service solutions. With Cognigy, retailers are quickly spinning up conversational AI agents on top of their MongoDB data to create personalized conversational experiences that not only meet but anticipate customer expectations. Increasingly, whether or not retailers offer customers immediate, seamless interactions are key to retaining their loyalty. Why next-gen conversational AI matters in retail Customer loyalty has been declining yearly, and customers are moving to retailers who can provide an elevated experience at every interaction. According to HubSpot’s 2024 annual customer service survey , 90% of customers expect an immediate response to their inquiries, highlighting how speed has become a critical factor in customer satisfaction. Additionally, 45.9% of business leaders prioritize improving customer experience over product and pricing , demonstrating that in retail, speed and personalization are no longer optional as they define whether a customer stays or moves on. The chatbots of the past that relied on simple rules-based engines and static data don’t meet these customers' new expectations as they lack real-time business context, and can generate misleading answers as they’re not training on the retailer's in-house data sets. This is where Cognigy’s AI agents can create a more compelling experience: These intelligent systems integrate real-time business data with the capabilities of LLMs, enabling AI-driven experiences that are not only personalized but also precise and controlled. Instead of leaving responses open to interpretation, retailers can customize interactions , guide users through processes, and ensure AI-driven recommendations align with actual inventory, customer history, and business rules. This level of contextual understanding and action creates trust-driven experiences that foster loyalty. Having quality data and the ability to harness it effectively is the only way to meet the strategic imperatives that customers demand today. This requires key factors such as being fast, flexible, and high-performing at the scale of your business operations, as winning companies must store and manage their information efficiently. This is where MongoDB, a general-purpose database, truly shines. It is designed to manage your constantly evolving business data, such as inventory, orders, transaction history, and user preferences. MongoDB’s document model stands out in the retail industry, offering the flexibility and scalability businesses need to thrive in today’s fast-paced environment. Cognigy can use this real-time operational data from MongoDB as a direct input to build, run, and deploy conversational AI agents at scale. With just a few clicks, businesses can create AI-driven chatbots and voice agents powered by large language models (LLMs), following their business workflows in a smooth and easy-to-implement way. These agents can seamlessly engage with customers across various phone lines as a major driver for customer interactions, including website chat, Facebook Messenger, and WhatsApp, offering personalized interactions. On the back end, Cognigy is built on MongoDB as its operational data store, taking full advantage of MongoDB’s scalability and high performance to ensure that its conversational AI systems can efficiently process and store large volumes of real-time data while maintaining high availability and reliability. The power of combining AI agents with real-time business data transforms personalization from a static concept into a dynamic ever-evolving experience that makes customers feel truly recognized and understood at every touchpoint. By harnessing these intelligent systems, retailers can go beyond generic interactions to deliver seamless, relevant, and engaging experiences that naturally strengthen customer relationships. Ultimately, true personalization isn’t just about efficiency; it’s about creating meaningful connections that drive lasting customer engagement and loyalty. Let’s look at how this looks in the Cognigy interface when you’re creating a flow for your chatbot: What’s happening behind the scenes? Figure 1 below shows an example customer journey, and demonstrates how Cognigy and MongoDB work together to use real-time data to give reliable and conversational responses to customer questions: Figure 1. An Agentic AI conversational flow with Cognigy pulling user and order data from MongoDB This user’s journey starts when they make a purchase on a retailer’s ecommerce application. The platform securely stores the order details, including product information, customer data, and order status, in MongoDB. To coordinate the delivery, the user reaches out via a chatbot or phone conversation orchestrated by Cognigy AI agents, using advanced Large Language Models (LLMs) to understand the user’s inquiries and respond in a natural, conversational tone. The AI agent retrieves the necessary user information and order details from MongoDB, configured as the data source, taking real-time data that is always up to date. By understanding the user’s query, the agent retrieves the appropriate database information and is also able to update the database with any relevant information generated during the conversation, such as modifying a delivery appointment. As the user schedules their delivery, Cognigy updates the information directly in MongoDB, leveraging features like triggers and change streams to seamlessly synchronize real-time data with other key systems in the customer journey, such as inventory management and delivery providers. This ensures personalized user experiences at every interaction. Shaping the future of customer service with MongoDB and Cognigy Delivering responsive, personalized customer service is more essential than ever. By combining MongoDB’s flexible, versatile, and performant data management with Cognigy’s powerful conversational AI, businesses can create seamless, real-time interactions that keep customers engaged. The future of customer service is fast, dynamic, and seamlessly integrated into business operations. With MongoDB and Cognigy, organizations can harness the power of AI to automate and personalize customer interactions in real time, without the need for extensive development efforts. The MongoDB-Cognigy integration enables businesses to scale context-driven interactions, strengthen customer relationships, and exceed expectations while building lasting customer loyalty. Learn more about how Cognigy built a leading conversational AI solution with MongoDB on our customer story page. Needing a solution for your retail needs? Head over to our retail solutions page to learn how MongoDB supports retail innovation. Read our blog to learn how to enhance retail solutions with retrieval-augmented generation (RAG).

April 10, 2025