Apache Spark is one of the fastest growing big data projects in the history of the Apache Software Foundation. With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.
Combining the leading analytics processing engine with the fastest-growing database enables organizations to operationalize sophisticated, real-time analytics. Spark jobs can be executed directly against operational data managed by MongoDB without the time and expense of ETL processes. MongoDB can then efficiently index and serve analytics results back into live, operational processes.
This white paper discusses the analytics capabilities offered by MongoDB and Apache Spark, and provides an overview of when and how to combine them into a real-time analytics engine. The paper concludes with example use cases.