MongoDB Days UK: MongoDB and Spark
Presented by Ross Lawley, Software Engineer, MongoDB
Experience level: Advanced
Modern architectures are moving away from "one size fits all" solutions. The best tools need to be put to the job and given the large amounts of options today, chances are that you’ll end up using MongoDB for your operational workload, as well as Spark for your high speed data processing needs. When documents or data structures are modeled, there are some key aspects that need to be attended. This takes into consideration the distribution of data nodes, streaming capabilities, performance, aggregation, and queryability options, and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when to embed or reference documents and the implications on high speed processing. Over the course of this talk we’ll detail the benefits of a good document model for the operational workload and what type of transformations should be incorporated in the document model to adjust for the high speed processing capabilities of Spark. We’ll look into the different possibilities to connect these two different systems, how to model according to the different workloads, what kind of operators to be aware of for top performance, and what kind of design and architectures should be put in place to make sure that all of these systems work well together.