Modernization

11 results

Application Modernization with gravity9 And MongoDB Atlas: How Digital Decoupling Supports the Customer Offering

The goal of most organizations is pretty clear: to improve customer offerings and become more operationally efficient, streamlined and profitable. But is it possible for organizations to excel in an agile fashion when they are reliant upon legacy systems? It’s the age-old dilemma between risk and innovation. How can you mitigate the former while accelerating the latter? Nearly all organizations operate with some type of legacy system in place that, often, is central to the operation of the business or the customer offering—i.e., one that would be highly costly and disruptive to move away from. To solve this predicament, the process of digital decoupling enables organizations to detach incrementally from legacy systems while acknowledging the critical role they often play. In this blog, we’ll further explore the value of digital decoupling as well as introduce how gravity9 with MongoDB Atlas delivers the smoothest transition possible. Why Not Simply Upgrade? Digital decoupling is not a “big-bang” upgrade where one system is fully replaced by another overnight; rather, it allows for the continued existence of your legacy system as part of your digital architecture while simultaneously unlocking innovation. But why not simply upgrade? Isn’t “out with the old and in with the new” the faster route to take? Not always. When a big-bang upgrade focuses on the replacement of a legacy system that is central to a customer offering or business operations, it becomes a much more complex, risky, and time-intensive undertaking. Often, many months or years can pass before any value is delivered to customers. But while your organization is focusing time and effort on a long-term, large-scale system replacement initiative, your customers’ needs will be changing and your competitors will be continuing to innovate. Once your new system is finally ready to be deployed to the marketplace, there’s a good chance it may already be rendered obsolete. Digital decoupling offers a faster, less risky, and more flexible alternative. The legacy system is maintained as the core of your business, but strategic portions are exposed through modern microservices to allow for the rapid creation of new digital products and offerings. The organization can utilize new, modern technologies to maintain the functionality of the legacy system while building a more advanced, digital architecture around it. By maintaining the existing legacy system, the organization significantly reduces disruption and risk while unlocking the ability to innovate new products on a rapid timescale. How it Works Applying a digital decoupling approach makes quickly innovating new digital products and services on top of your data possible by way of microservices and an event-driven architecture. Microservice Architecture after Digital Decoupling By utilizing event-driven architecture, individual systems and capabilities can be built as fully scalable microservices each with their own database, allowing solutions to be built around each microservice that can be combined to provide limitless additional capabilities and services for customers in a rapid and agile fashion. Digital decoupling creates a customer experience delivered via a modern, feature-rich UI or website that is intuitive, user friendly and continuously evolving, while the legacy system still operates behind the scenes. After years of working with large organizations, the solutions architects at gravity9 have a deep understanding of event-driven architecture as a solution to digital decoupling. Our adherence to domain-driven design is in our DNA , it is how we build solutions and is core to the way we work….We build event-driven microservices on top of monolithic legacy architecture. Noel Ady, gravity9 Founding Partner. By utilizing domain-driven design, system actions are communicated or triggered by way of an event, with colloquialized messages sent between the legacy application and the new architecture via a bus. An adaptor is created to sit in front of the legacy system and speak to your “new IT” in the language of events. This adaptor looks at the data in your legacy system and raises events when changes occur, then optionally writes back changes raised by other systems, allowing your legacy system to participate in the event-driven architecture. The use of APIs ensures the traffic is two-way and non-intrusive to the legacy application so that it can continue to operate as expected. One of the key technology concerns related to adaptors for legacy systems is the concept of a “delta store.” Events in an event-driven architecture should contain the context for the event, often including the previous value, to help receiving systems properly respond to the event. In more modern systems it’s possible to get this data from webhooks or similar alternatives, but these mechanisms won’t exist in older legacy systems so a different approach via a delta store is needed. A delta store will contain the history of changes on a value (the ‘deltas’) to allow the adaptor to properly construct the event context and to ensure that events are only raised for true changes in values. Why MongoDB? MongoDB’s flexible data schema makes it an excellent implementation technology for a delta store, allowing a dynamic mechanism that can flex to new data and event types on demand. gravity9 partners with MongoDB Atlas, MongoDB’s multi-cloud, secure and flexible database service, as an integral technology enabler of digital decoupling to increase flexibility of the resulting architecture. Importantly, Atlas also enhances reliability for mission-critical production databases with continuous backups and point-in-time recovery. It’s secure for sensitive data and automates key processes like infrastructure provisioning, setup and deployment so teams can access the database resources they need, when they need them. Best of all, MongoDB’s features and benefits help free up developer time so they can focus their talent on more innovative tasks. What Should I do Next? The logic is just as important as the physical in digital decoupling when it comes to modelling your events. Utilizing best-practice, domain-driven design alongside a proven approach is the key to success. Together, gravity9 and MongoDB have replicated this success time and time again, enabling organizations to lay the foundations for newer more modern architecture without the disruption of removing their legacy systems. Interested in learning more about MongoDB’s Modernization Program? Contact us today!

May 27, 2021

Infosys Media Platform & MongoDB: Metadata Management and Workflow Orchestration Across Media Supply

Capitalize on current and innovative technologies in the media supply chain with the Infosys Media Platform (IMP). As a part of the cloud and cloud-based Infosys Cobalt ™ portfolio, Infosys’ unifying framework, built on MongoDB Atlas and MongoDB Enterprise Advanced , helps you facilitate creative collaboration, enable productions on an industrial scale, and monetize customer relationships. How? By integrating the various ecosystems involved and providing a common platform to connect you to services and technology solutions for the media content value chain. Infosys’ intelligently-woven media and metadata management framework, leveraging MongoDB’s document model, enables smart workflows and incorporates ML/AI to create, manage and moderate content metadata. This allows the orchestration workflows across different business functions. Additionally, the platform delivers the benefits of productivity, scalability, and agility via the cloud and streamlines collaboration among the ecosystem of partners and technology solutions. Why Infosys Media Platform (IMP)? The Infosys Media Platform consists of various modules that serve critical business functions, such as: The Curation & Digitization Module -- provides the master workflow and ingests content from internal archives and multiple sources using AI/ML to create a composite index of frame level and time-coded metadata of recognized elements (such as celebrities/known personalities, objects, brands, text, and images). It also enables intelligent ad spot identification and includes functions like automated QC, editing, review, and approval, and censor editing The Custom AI Module -- ensures that newly introduced elements such as celebrities, brands, etc., can be continuously trained and recognized. This can also be used to custom-train the AI models to recognize specific content as per customer’s needs The Localization Module -- enables collaboration across multiple locations and global vendors through automatic generation of closed captions and subtitles in multiple languages Metadata Management and Distribution Module -- enables global distribution to digital platforms at scale through standard workflow models and a state-of-the-art dashboard by orchestrating the accumulation of asset level and descriptive time-based metadata from production to delivery With the above modules, Infosys Media Platform can attain the following capabilities: Content Enrichment -- leverages AI models to process video files and generate time-coded metadata for post-production and distribution process Closed Captioning, Subtitling and Localization -- processes audio (dialogs from a video, lyrics from a song, speech from a podcast) and converts it into closed captions and subtitles Content Moderation -- recognizes the presence of mature content (profanity, violence, gore etc.) using the video content and speech detection capabilities Image Processing -- identifies various attributes of an image file, similar to the capabilities on video/music content Metadata Packaging & Distribution -- manages end-to-end supply chain of digital metadata creation, updates, packaging and distribution NLP Based Analytics -- using natural language processing capabilities of the platform, users can review any string of text (dialogs, lyrics, conversations) to determine the context of the conversation as well as the sentiment Why MongoDB for Infosys Media Platform? What company wouldn’t want a database platform that increases developer productivity and data-driven operational efficiency? MongoDB offers both. With MongoDB Atlas , you can reduce the time developers spend managing data and databases, so they can focus on value-added tasks like developing new apps. MongoDB’s document model and query language provide easier access to data, allowing developers to work quickly and efficiently to support new data structures and data types, as well as leverage database-supported roll-ups for analysis. Additionally, MongoDB Atlas introduces 100+ metrics and monitoring capabilities with its complete data platform built to improve operational productivity, so you can work smarter, not harder. A main feature of Infosys Media Platform is its cloud-agnostic nature; and as a cloud-agnostic and multi-cloud data platform, MongoDB is the only platform that satisfies IMP with its ability to run seamlessly across the globe. Through a mixed workload of real-time and transactional analytics, IMP also offers a roadmap on analytics, text search and data visualization capabilities--and MongoDB provides all these features. How MongoDB powers the data platform for Infosys Media Platform (IMP) Data for all the modules previously described is powered by the MongoDB data platform Details like profile data, accounts, ratings, translations, country/region details are stored in MongoDB Audit transactional data are currently running on SQL and there is a roadmap for moving to MongoDB data platform . In the future, MongoDB’s core capabilities will further enhance the Infosys Media Platform and customer experience. Our roadmap includes utilizing MongoDB ACID transactional capabilities to store audit details, as well as using MongoDB functions and triggers for creating cloud agnostic serverless functions. Additionally, MongoDB Atlas may be leveraged for full-text search capabilities applied to media data, and to create charts for dashboarding and real-time analytics of user subscriber data. Download the Modernization Guide

May 19, 2021

How DataSwitch And MongoDB Atlas Can Help Modernize Your Legacy Workloads

Data modernization is here to stay, and DataSwitch and MongoDB are leading the way forward. Research strongly indicates that the future of the Database Management System (DBMS) market is in the cloud, and the ideal way to shift from an outdated, legacy DBMS to a modern, cloud-friendly data warehouse is through data modernization. There are a few key factors driving this shift. Increasingly, companies need to store and manage unstructured data in a cloud-enabled system, as opposed to a legacy DBMS which is only designed for structured data. Moreover, the amount of data generated by a business is increasing at a rate of 55% to 65% every year and the majority of it is unstructured. A modernized database that can improve data quality and availability provides tremendous benefits in performance, scalability, and cost optimization. It also provides a foundation for improving business value through informed decision-making. Additionally, cloud-enabled databases support greater agility so you can upgrade current applications and build new ones faster to meet customer demand. Gartner predicts that by 2022, 75% of all databases will be on the cloud – either by direct deployment or through data migration and modernization. But research shows that over 40% of migration projects fail. This is due to challenges such as: Inadequate knowledge of legacy applications and their data design Complexity of code and design from different legacy applications Lack of automation tools for transforming from legacy data processing to cloud-friendly data and processes It is essential to harness a strategic approach and choose the right partner for your data modernization journey. We’re here to help you do just that. Why MongoDB? MongoDB is built for modern application developers and for the cloud era. As a general purpose, document-based, distributed database, it facilitates high productivity and can handle huge volumes of data. The document database stores data in JSON-like documents and is built on a scale-out architecture that is optimal for any kind of developer who builds scalable applications through agile methodologies. Ultimately, MongoDB fosters business agility, scalability and innovation. Key MongoDB advantages include: Rich JSON Documents Powerful query language Multi-cloud data distribution Security of sensitive data Quick storage and retrieval of data Capacity for huge volumes of data and traffic Design supports greater developer productivity Extremely reliable for mission-critical workloads Architected for optimal performance and efficiency Key advantages of MongoDB Atlas , MongoDB’s hosted database as a service, include: Multi-cloud data distribution Secure for sensitive data Designed for developer productivity Reliable for mission critical workloads Built for optimal performance Managed for operational efficiency To be clear, JSON documents are the most productive way to work with data as they support nested objects and arrays as values. They also support schemas that are flexible and dynamic. MongoDB’s powerful query language enables sorting and filtering of any field, regardless of how nested it is in a document. Moreover, it provides support for aggregations as well as modern use cases including graph search, geo-based search and text search. Queries are in JSON and are easy to compose. MongoDB provides support for joins in queries. MongoDB supports two types of relationships with the ability to reference and embed. It has all the power of a relational database and much, much more. Companies of all sizes can use MongoDB as it successfully operates on a large and mature platform ecosystem. Developers enjoy a great user experience with the ability to provision MongoDB Atlas clusters and commence coding instantly. A global community of developers and consultants makes it easy to get the help you need, if and when you need it. In addition, MongoDB supports all major languages and provides enterprise-grade support. Why DataSwitch as a partner for MongoDB? Automated schema re-design, data migration & code conversion DataSwitch is a trusted partner for cost-effective, accelerated solutions for digital data transformation, migration and modernization through a modern database platform. Our no-code and low-code solutions along with cloud data expertise and unique, automated schema generation accelerates time to market. We provide end-to-end data, schema and process migration with automated replatforming and refactoring, thereby delivering: 50% faster time to market 60% reduction in total cost of delivery Assured quality with built-in best practices, guidelines and accuracy Data modernization: How “DataSwitch Migrate” helps you migrate from RDBMS to MongoDB DataSwitch Migrate (“DS Migrate”) is a no-code and low-code toolkit that leverages advanced automation to provide intuitive, predictive and self-serviceable schema redesign from a traditional RDBMS model to MongoDB’s Document Model with built-in best practices. Based on data volume, performance, and criticality, DS Migrate automatically recommends the appropriate ETTL (Extract, Transfer, Transform & Load) data migration process. DataSwitch delivers data engineering solutions and transformations in half the timeframe of the existing typical data modernization solutions. Consider these key areas: Schema redesign – construct a new framework for data management. DS Migrate provides automated data migration and transformation based on your redesigned schema, as well as no-touch code conversion from legacy data scripts to MongoDB Atlas APIs. Users can simply drag and drop the schema for redesign and the platform converts it to a document-based JSON structure by applying MongoDB modeling best practices. The platform then automatically migrates data to the new, re-designed JSON structure. It also converts the legacy database script for MongoDB. This automated, user-friendly data migration is faster than anything you’ve ever seen. Here’s a look at how the schema designer works. Refactoring – change the data structure to match the new schema. DS Migrate handles this through auto code generation for migrating the data. This is far beyond a mere lift and shift. DataSwitch takes care of refactoring and replatforming (moving from the legacy platform to MongoDB) automatically. It is a game-changing unique capability to perform all these tasks within a single platform. Security – mask and tokenize data while moving the data from on-premise to the cloud. As the data is moving to a potentially public cloud, you must keep it secure. DataSwitch’s tool has the capability to configure and apply security measures automatically while migrating the data. Data Quality – ensure that data is clean, complete, trustworthy, consistent. DataSwitch allows you to configure your own quality rules and automatically apply them during data migration. In summary: first, the DataSwitch tool automatically extracts the data from an existing database, like Oracle. It then exports the data and stores it locally before zipping and transferring it to the cloud. Next, DataSwitch transforms the data by altering the data structure to match the re-designed schema, and applying data security measures during the transform step. Lastly, DS Migrate loads the data and processes it into MongoDB in its entirety. Process Conversion Process conversion, where scripts and process logic are migrated from legacy DBMS to a modern DBMS, is made easier thanks to a high degree of automation. Minimal coding and manual intervention are required and the journey is accelerated. It involves: DML – Data Manipulation Language CRUD – typical application functionality (Create, Read, Update & Delete) Converting to the equivalent of MongoDB Atlas API Degree of automation DataSwitch provides during Migration Schema Migration Activities DS Automation Capabilities Application Data Usage Analysis 70% 3NF to NoSQL Schema Recommendation 60% Schema Re-Design Self Services 50% Predictive Data Mapping 60% Process Migration Activities DS Automation Capabilities CRUD based SQL conversion (Oracle, MySQL, SQLServer, Teradata, DB2) to MongoDB API 70% Data Migration Activities DS Automation Capabilities Migration Script Creation 90% Historical Data Migration 90% 2 Catch Load 90% DataSwitch Legacy Modernization as a Service (LMaas): Our consulting expertise combined with the DS Migrate tool allows us to harness the power of the cloud for data transformation of RDBMS legacy data systems to MongoDB. Our solution delivers legacy transformation in half the time frame through pay-per-usage. Key strengths include: ● Data Architecture Consulting ● Data Modernization Assessment and Migration Strategy ● Specialized Modernization Services DS Migrate Architecture Diagram Contact us to learn more.

May 13, 2021

Exploring Data with MongoDB Atlas, Databricks, and Google Cloud

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more . MongoDB Atlas supports Google Cloud (GC), enabling you to easily spin up managed MongoDB clusters within GC in minutes. We’re excited to share that Databricks recently launched Databricks on GC, giving customers the freedom to move and analyze their data within GC and MongoDB Atlas. With the latest update to Databricks, it’s now easier to get started with a cloud-first approach on GC that leverages MongoDB Atlas with its flexible data model designed for modern applications and Databricks for more advanced analytics use cases. The following tutorial illustrates how to use MongoDB Atlas on GC and Databricks. We’ll use sample sales data in MongoDB Atlas and calculate the rolling average using Databricks on GC. This tutorial covers the following: How to read data from MongoDB Atlas on GC into Spark How to run the MongoDB Connector for Spark as a library in Databricks How to use the PySpark libraries to perform rolling averages of sales data How to write these averages back to MongoDB so they are accessible to applications Create Databricks Workspace To provision a new Databricks workspace, you will need to have a GC project already created. If you do not already have a Databricks cluster deployed on GC, follow the online documentation to create one. Note: It is important to follow the documentation, because there are a few key settings you will need to make in your GC project, such as enabling container.googleapis.com, storage.googleapis.com, and deploymentmanager.googleapis.com services and adjusting certain Google Cloud quotas before creating your Databricks cluster. In this example we have already created the Google Cloud project mongodb-supplysales and are ready to go to the Google Marketplace and add Databricks to our project. Within your Google project, click on “Marketplace” and enter “Databricks” in the search box. Click on the resulting tile and follow the instructions. Once your Databricks cluster is created, navigate to the Databricks cluster with the URL provided. Here you can create a new workspace. Once you’ve created your workspace, you will be able to launch it from the URL provided: Logging into your workspace brings up the following welcome screen: In this article, we will create a notebook to read data from MongoDB and use the PySpark libraries to perform the rolling average calculation. We can create our Databricks cluster by selecting the “+ Create Cluster” button from the Clusters menu. Note: For the purposes of this walkthrough we chose only one worker and preemptible instances; in a production environment you would want to include more workers and autoscaling. Before we create our cluster, we have the option under Advanced Options to provide Spark configuration variables. One of the common settings for Spark config is to define spark.mongodb.output.uri and spark.mongodb.input.uri . First we need to create the MongoDB Atlas cluster so we have a connection string to enter for these values. At this point, open a new browser tab and navigate to MongoDB Atlas. Prepare a MongoDB Atlas Instance Once in the MongoDB Atlas portal, you will need to do the following before you can use Atlas with Databricks: Create your MongoDB Atlas cluster Define user credentials for use in the Spark connector Define network access Add sample data (optional for this article) Create Your MongoDB Atlas Cluster If you already have a MongoDB Atlas account, log in and create a new Atlas cluster. If you do not have an account, you can set up a free cluster at the following URL: https://www.mongodb.com/cloud . Once your account is set up, you can create a new Atlas cluster by using the “+ New Cluster” dialog. MongoDB provides a free tier for Google Cloud. Once you provide a cluster name and click on “create,” Atlas will take approximately five to seven minutes to create your Atlas cluster. Define Database Access By default there are no users created in an Atlas cluster. To create an identity for our Spark cluster to connect to MongoDB Atlas, launch the “Add New Database User” dialog from the Database Access menu item. Notice that there are three options for authentication to MongoDB Atlas: Password, Certificate, and AWS IAM authentication. Select “Password,” and enter a username and password. Atlas provides granular access control: For example, you could restrict this user account to work only with a specific Atlas cluster or define the account as temporary and have Atlas expire within a specific time period. Defining Network Access MongoDB Atlas does not allow any connection from the internet by default. You need to include MongoDB Atlas as part of a VPC peering or AWS PrivateLink configuration. If you do not have that set up with your cloud provider, you need to specify from which IP addresses Atlas can accept incoming connections. You can do this via the “Add IP Address” dialog in the Network Access menu. In this article, we will add “0.0.0.0,” allowing access from anywhere, because we don’t know specifically which IP our Databricks cluster will be running on. MongoDB Atlas can also make this IP access list temporary, which is great for situations where you need to allow access from anywhere. Add Sample Data Now that we have added our user account and allowed network access to our Atlas cluster, we need to add some sample data. Atlas provides several sample collections that are accessible from the menu item on the cluster. In this example, we will use the sales collection within the sample_supplies database. Update Spark Configuration with Atlas Connection String Copy the MongoDB Atlas connection string by clicking on the Connect button and selecting “Connect your application.” Copy the contents of the connection string and note the placeholders for username and password . You will have to change those to your own credentials. Return to your Databricks workspace. Under Advanced Options in your Databricks workspace, paste the connection string for both the spark.mongodb.output.uri and spark.mongodb.input.uri variables. Note that you will need to update the credentials in the MongoDB Atlas connection string with those you defined previously. For simplicity in your PySpark code, change the default database in the connection string from MyFirstDatabase to sample_supplies. (This is optional, because you can always define the database name via Spark configuration options at runtime.) Start the Databricks Cluster Now that your Spark config is set, start the cluster. Note: If the cluster fails to start, check the event log and view the JSON tab. This is an example error message you will receive if you forgot to increase the SSD storage quota: Add MongoDB Spark Connector Once the cluster is up and running, click on “Install New” from the Libraries menu. Here we have a variety of ways to create a library, including uploading a JAR file or downloading the Spark connector from Maven. In this example, we will use Maven and specify org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 as the coordinates. Click on “Install” to add our MongoDB Spark Connector library to the cluster. Note: If you get the error message “Maven libraries are only supported on Databricks Runtime version 7.3 LTS, and versions >= 8.1,” you can download the MongoDB Spark Connector JAR file from https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.12/3.0.1/ and then upload it to Databricks by using the Upload menu option. Create a New Notebook Click on the Databricks home icon from the menu and select “Create a blank notebook.” Attach this new notebook to the cluster you created in the previous step. Because we defined our MongoDB connection string as part of the Spark conf cluster configuration, your notebook already has the MongoDB Atlas connection context. In the first cell, paste the following: from pyspark.sql import SparkSession pipeline="[{'$match': { 'items.name':'printer paper' }}, {'$unwind': { path: '$items' }}, {'$addFields': { totalSale: { \ '$multiply': [ '$items.price', '$items.quantity' ] } }}, {'$project': { saleDate:1,totalSale:1,_id:0 }}]" salesDF = spark.read.format("mongo").option("collection","sales").option("pipeline", pipeline).option("partitioner", "MongoSinglePartitioner").load() Run the cell to make sure you can connect the Atlas cluster. Note: If you get an error such as “MongoTimeoutException,” make sure your MongoDB Atlas cluster has the appropriate network access configured. The notebook gave us a schema view of what the data looks like. Although we could have continued to transform the data in the Mongo pipeline before it reached Spark, let’s use PySpark to transform it. Create a new cell and enter the following: from pyspark.sql.window import Window from pyspark.sql import functions as F salesAgg=salesDF.withColumn('saleDate', F.col('saleDate').cast('date')).groupBy("saleDate").sum("totalSale").orderBy("saleDate") w = Window.orderBy('saleDate').rowsBetween(-7, 0) df = salesAgg.withColumn('rolling_average', F.avg('sum(totalSale)').over(w)) df.show(truncate=False) Once the code is executed, the notebook will display our new dataframe with the rolling averages column: It is this cell where we will provide some additional transformation of the data such as grouping the data by saleDate and provide a summation of the totalSale per day. Once the data is in our desired format, we define a window of time as the past seven entries and then add a column to our data frame that is a rolling average of the total sales data. Once we have performed our analytics, we can write the data back to MongoDB for additional reporting, analytics, or archiving. In this scenario, we are writing the data back to a new collection called sales-averages: df.write.format("mongo").option("collection","sales-averages").save() You can see the data by using the Collections tab within the MongoDB Atlas cluster UI. WIth the data in MongoDB Atlas, you can now leverage many of the services available, including Atlas Online Archive, Atlas Search, and Atlas Data Lake. Summary The integration between MongoDB Atlas, Google Cloud, and Databricks enables you to gain deep insights into your data and gives you freedom to move and analyze data as your needs evolve. Check out the resources below for more information: Getting started with MongoDB Atlas MongoDB Spark Connector MongoDB Atlas on Google Cloud

May 11, 2021

Accelerate Data Modernization with Infosys Data Model Converter

Are you in the process of migrating applications from a relational database to MongoDB? If so, you’re likely trying to best understand and decide how your enterprise data needs to be modeled. Our previous blog discussed how Infosys Data Services Suite can help enterprises move data seamlessly from legacy relational databases to MongoDB. But moving data is only one part of the puzzle. The more significant step is choosing the target data model, or schema design, a process that usually requires several hours of highly skilled talent. That’s why we created this follow-up blog to help you get started. Rethinking Schema Design Ultimately, schema design can be the difference between an inefficient, disorganized database and a strategic one that empowers the entire company. Schema design in MongoDB requires a change in perspective for data architects, developers, and database administrators. They have to: Rethink the legacy relational data model. This model flattens data into rigid two-dimensional tabular structures of rows and columns. The new data model is a rich and dynamic one with embedded sub-documents and arrays Rethink how the data platform works. In relational databases, it is extremely difficult to change the data platform as the application evolves. However, in MongoDB, the apps and APIs come first and the data platform dynamically accommodates the data Getting Schema Design Right Begin the schema design process by considering the application’s requirements. You’ll want to model the data in a way that leverages the flexibility of the document model. In schema migrations, it may seem easy at first to simply mirror the flat schema of the relational database in the document model. However, this negates the advantages enabled by the rich and embedded data structures of the document model. For example, data that belongs to a parent-child relationship in two RDBMS tables can be collapsed (embedded) into a single document in MongoDB. The application data access patterns should also drive schema design with a specific focus on: The read/write ratio of database operations and whether it is more important to optimize the performance of one operation over another The types of queries and updates performed by the databases The lifecycle of the data and growth rate of documents Simplifying Schema Design with Infosys Data Model Converter Infosys has developed a solution called Infosys Data Model Convertor that processes source relational schema and the above-mentioned signals as inputs and automatically provides target MongoDB schema suggestions. Infosys Data Model Converter is available as part of Infosys Modernization Suite which accelerates enterprises’ modernization journey. Each schema suggestion is accompanied by a detailed analysis report. The data modeler can use this as a starting point and iterate over the schema to arrive at the final MongoDB schema. The Infosys Data Model Converter reduces 50-60% of the effort typically spent in schema design. Key Features Boosts productivity by augmenting the migration of RDBMS to NoSQL database Saves time by automatically extracting schema, query and data patterns from an existing RDBMS Comprehensively analyzes the RDBMS entity relations, data and read-and-write patterns Applies a rich set of rules and generates a fully compliant NoSQL target state data model Offers flexibility by externalizing the rules for organization-specific customizations Connects and deploys the model to the target NoSQL platform with sample data Discover more ways in which Infosys can help you unlock value from modernization. Contact us for any modernization questions.

April 15, 2021

Announcing the MongoDB SI Architect Certification Program for Modernization to the Cloud

The product names referenced in this blog are outdated. Realm Sync refers to what is currently known as Atlas Device Sync. You know the value of modernization as a strategic initiative. It’s not only about refreshing your portfolio of legacy applications with the latest innovations simply for the sake of moving to the cloud. This is much more than just “lift and shift”. True modernization is about realizing your company’s full potential and gaining a competitive edge through development methodologies, architectural patterns and technologies. And by modernizing with MongoDB, you can build new business functionality 3-5x faster, scale to millions of users wherever they are on the planet, and cut costs by 70% or more. If you’re familiar with our technology and our Modernization Program , you already understand the benefits. But do your customers? And, if not, how do you tell them? To help you get started, the MongoDB Partner team has created the MongoDB SI Architect Certification , a full scale kit of assets related to modernization. This free, self-paced certification helps you improve the modernization experience for a variety of customer types as well as drive conversations with customers around data center exit plans and application qualification for assessing cloud data platforms. Consider this certification the next step in deepening your expertise so you can expand your business opportunities and help customers modernize to the cloud. Customized for System Integrator partners, our certification teaches you how to discuss the benefits of modernization with various customers on a cloud journey. It enables architects to have deep discussions on vertical-based stories, migration tools, best practices, and architecture guidelines. System Integrator partners will also learn the fundamental value of offerings, messaging, objection handling, and more. Most importantly, this certification program equips SI architects with the ability to communicate key takeaways to the customer in a language they understand. Program Structure The free SI Architect certification program is self-paced, takes approximately 20 hours, and divided into six key sections, complete with a final certification exam. Introduction allows partners to access the modernization webinars and modernization program offerings. Top use cases focus on how MongoDB is used in business-wide strategic initiatives, like legacy modernization, cloud data strategy, microservices and more vertical based stories. Customer case studies highlight how MongoDB is deployed and leveraged through real-life customer case studies and proof points. University classes allow participants to leverage MongoDB university on-line as well as on-demand courses relevant to the architects. Competitive edge helps architects understand the true value of MongoDB in comparison to the competition. Final certification culminates the program with a "Talk to the experts" session and final certification exam where participants take a real world industry use case or customer project and assess how to migrate to the cloud. Our “Talk to the experts” session provides users with the opportunity to query experts with questions about the final certification exam. It also introduces the messaging around “MongoDB: The Intelligent Operational Data Platform” and details an Atlas TCO and sizing exercise. In addition to these assets, partners also have access to self-paced developer training and database administrator training here . Note: Download the enhanced Modernization Guide to refresh your knowledge on MongoDB modernization Dive Deeper into MongoDB Cloud Technology What’s one key lesson we know for certain? The data management platform you choose is a key factor in successfully migrating legacy applications to the cloud. The MongoDB Cloud section of our Architecture Guide discusses the unique value MongoDB can bring to organizations making the transition to cloud. Note: Download the Architecture Guide to refresh your knowledge on MongoDB Cloud The key components of the MongoDB cloud platform are: At its core is MongoDB , the general purpose operational database for modern applications. Nearly every application needs a fast database that can deliver single digit millisecond response times; and when it comes to speed, MongoDB delivers. With our flexible document data model, transactional guarantees, rich and expressive query language, and native support for both vertical and horizontal scaling, MongoDB can be used for practically any use case, reducing the need for specialized databases even as your requirements change. With multi-cloud clusters on MongoDB Atlas , customers can realize the benefits of a multi-cloud strategy with true data portability and a simplified management experience. Multi-cloud clusters provide the best-in-class technology across multiple clouds in parallel, migrate workloads across cloud providers seamlessly, and improve high availability with cross-cloud redundancy. Realm Mobile Database extends this data foundation to the edge. Realm is a lightweight database embedded on the client side. Realm helps solve the unique challenges of building for mobile, making it simple to store data on-device while also enabling data access when offline. Realm Sync is seamlessly integrated and keeps data up-to-date across devices and users by automatically syncing data between the client and a backend Atlas cluster. Ready to boost your knowledge and expertise? The Modernization Guide, Architecture Guide, and SI Architect certification program are waiting for you. Get started today. Start the free MongoDB SI Architect certification program today!

March 24, 2021

Optimize Data Modeling and Schema Design with Hackolade and MongoDB

Development teams are constantly searching for new ways to quickly enhance applications and satisfy the rapid progression of customer needs. The dynamic schema evolution in MongoDB enables such a reality through the power and flexibility of storing data in a JSON document format instead of in relational tables. As application complexity and scale increase in an enterprise environment, this flexible schema can be skillfully organized to harness the power of the solution, maximize developer productivity, and lower the total cost of ownership. For large enterprises and government agencies, the key is to leverage the benefits of modern applications running on MongoDB Atlas while also ensuring proper data management and governance. This is where a data modeling tool designed specifically for MongoDB will greatly help. Enter Hackolade. For decades, Entity-Relationship Diagrams (ERDs) have been used to visually represent the data structures of relational databases. But ERDs were originally designed for flat structures only. Hackolade , a MongoDB certified technology partner , has enhanced ERD capabilities to accommodate the representation of JSON hierarchical structures with nested objects and arrays. Hackolade is pioneering data modeling and schema design for NoSQL databases and REST APIs. Why it Matters A data model is an abstraction describing and documenting an organization’s information system. It is a collection of Entity-Relationship diagrams, descriptions, constraints, and metadata representing data structures: Hackolade data model for MongoDB A schema, on the other hand, is a “consumable” scope contract describing the layout or structure of a file, a transaction, or a database. It is an authoritative source for producers and consumers to agree on the structure being exchanged or accessed. While data models are useful for humans to understand structure, schemas are the technical artifact necessary for systems to interact. Hackolade provides both, allowing MongoDB customers to easily visualize the data model, intuitively create and enforce schema with MongoDB’s JSON Schema Validator, and iteratively change the schema as the applications evolve. Automatically-generated JSON Schema Validator Customer Benefits Increase data agility with forward-engineering An ERD provides an easy-to-understand picture of your data. As a communication tool, it helps facilitate dialog between application stakeholders like business analysts, designers, architects, developers, and DBAs. With an ERD, you can evaluate different “what if” scenarios, identify the ideal way to denormalize data, and leverage the benefits of MongoDB Atlas technology. Simply apply a query-driven design of the schema after analyzing the access patterns of the application. You can then visualize and evaluate the impacts without writing a line of code—obviously, this is a more productive approach than coding first, then realizing that much needs to be rewritten to accommodate everyone’s needs. The Hackolade software generates several artifacts such as: collection creation with validator script requiring no knowledge of JSON Schema syntax, sample JSON documents, Mongoose schemas, documentation in HTML, Markdown or PDF, plotter output of ERD pictures, document and index sizing estimates, and more. The process is easily integrated into a Jenkins CI/CD pipeline by invoking a flexible Command-Line Interface. Ensure data quality and compliance through schema reverse-engineering Deriving a data model from an existing MongoDB instance is not as easy as fetching a DDL from a relational database. Schemas must be inferred from a representative sample of documents in each collection. Hackolade has perfected its schema inference algorithms to accommodate the flexibility and polymorphism of JSON hierarchical structures. The derived models become a trusted source to feed data dictionaries and data governance suites. Reverse-engineering helps ensure data quality and compliance, with the use of an automated Command-Line Interface process. Facilitate application modernization with the denormalization of legacy data structures Hackolade can import a variety of structures from relational DDLs, logical data models in XSD format, JSON documents and schemas, and Excel templates. To leverage the benefits of MongoDB, these structures should evolve to embed information where applicable and avoid slow JOINs. This should not be done blindly, but based on a proper analysis of the application access patterns in the context of data volume estimates and relationship cardinality. Hackolade provides a handy feature to quickly evolve a relational data model towards a denormalized schema, thereby leveraging the benefits of MongoDB’s document model and facilitating modernization. The process easily hooks into the forward-engineering process described above, generating pictures, scripts, and documentation. Implement continuous evolution and data management The lifecycle of modernized applications does not stop after the initial data migration step. Applications must be successfully operated, and will continue to evolve, resulting in likely schema changes. Hackolade is designed to facilitate agile development approaches and the full lifecycle of modern software. It provides the necessary tooling to design and manage data models and schemas for successful application modernization on MongoDB Atlas. Learn how to maximize developer productivity and lower total cost of ownership using data modeling with Hackolade , and the MongoDB University data modeling advanced course . Download the joint solution brief: MongoDB and Hackolade: Visual Data Modeling for MongoDB Schemas .

March 11, 2021

4 Steps to Success: From Surviving with Legacy Systems to Thriving with MongoDB

Legacy data migrations imply a change in the status quo. More often than not, when an organization finally undertakes a thorough analysis of its technology landscape, it arrives at the same decision: to do nothing. It is an understandably daunting task to upgrade or replace 20+ year-old applications and their database counterparts. But there are good reasons, beyond the tri-annual hardware upgrade, to propel those legacy monoliths of the 1990s into the 21st century. Companies that prevailed—and even triumphed—in the volatile spring of 2020 were those that transitioned to a more flexible usage model and were therefore able to adjust their business models more rapidly and reliably. MongoDB’s client, Sanoma, was one of the winners. Sanoma was able to scale from 3,000 to 150,000 users within 24 hours, without any service interruption. Innovation and modernization go hand in hand. However, while modernization can sadly occur without innovation, the opposite is simply not possible. A bit of history The concept of bringing data together through online data layers (ODL) or operational data stores (ODS) isn't new or specific to MongoDB. Accessing legacy systems, bringing data together, and making it all more easily accessible was a common goal even 20 years ago, and led to the search for the golden source of truth (i.e. the definitive master source for any given entity). This search proved elusive early on due to the hurdles involved with bringing data from diverse, over-structured relational constructs to a sole target called Operational Data Store (ODS) or Online Data Layer (ODL). The industry’s first attempts began with Object-oriented databases , then with the dead end of XML data stores. (In my personal opinion, Xquery and Xpath were never meant for real developers). After both endeavors failed, then came the wave of Apache efforts I like to call “Hadoop Solves the Planet,” in which companies dumped all their structured data onto a big-data treasure trove. Unfortunately, this resulted in a data desert rather than the data lake everybody was hoping for, since organizations then had to scramble to build a concept for secondary indexing, data dictionaries, and more, on top of having to rebuild the sensible structures they lost. In the 2010s, the document model, in conjunction with JSON notation , emerged as the new de facto standard. MongoDB release 3.x introduced the combination of ACID (atomicity, consistency, isolation, durability) and compliance with a broad range of data types (in BSON, for those in the know). Soon, the MongoDB team started implementing additional features of relational heritage: secondary indexing, ACID transactions, aggregations and manipulations of data in site, materialized views, joins, unions... the list goes on. Where we are now MongoDB documents can be enriched through different means and channels without touching the content — the consistency of all data and data lineage is implicitly guaranteed. A typical example is the extraction of a delivery address through a supply chain application and a billing address through an enterprise resource planning system. In many cases, those two systems have different requirements. MongoDB documents simply keep both instantiations intact and can even hold multiples of each attached to one single client profile without the need to complete loads and transformations, foreign keys, and all the other ingredients of the relational past. MongoDB simply adds and leverages other sources without destroying their context. MongoDB delivers an ODS and ODL experience while streamlining the time-consuming journey of replacing legacy application code.The data platform of true modernization and innovation has arrived! How your company can get here The entire journey can be summarized in four simple steps: Analysis: Where do I start my data journey to drive the fastest value? Scaffolding: How do I get my data out of the existing platform and bridge it to the new platform? Coding: How do I enter the world of adjusting and adapting my applications landscape? Innovation: Which are the easiest targets for my company to start achieving true innovation? The following sections answer these four questions and provide you with a starting point for your journey toward a new and improved solution landscape. Step 1: Analysis of your existing solution landscape Data Provisioning Data provisioning—the act of bringing data from source system(s) to target system—is actually the easy part of this step. Opinions may vary as to the very best approach, but most existing models for streaming data in real time make the process elegant and allow for a business-driven decision from real-time replication on one end to communicate with the batch of .CSV files on the other end. Application onboarding More exciting is the application onboarding phase, inclusive of the selection and design of initial data domains. Here, simple mechanisms derived from the classic priority concepts can assist—and yes, they existed long before computers. Data domains already exist in objects in the business logic represented through their objects in the various programming languages. But even the most talented application developer deals with constant changes which leads to compromises in those objects and can obfuscate the original clarity in their design so the objects may hide in plain sight. Unearthing those gems and aligning them to the ODS is the most important step towards true legacy modernization. The most simple solution is actually the most practical one: load an object with the existing software and persist it into a MongoDB collection. The effort of persisting the object results in two lines of code that can be easily added. The location of the two lines of code (first line one opens connection to database; second line one persists the object) does not matter as long as it is in a place after the object is built out. This is the first time you will see the beauty of MongoDB and MQL at work. You really have to do nothing for the object itself—e.g. no decomposition or abstraction layer. MongoDB takes care of it for you. When looking at the object in the MongoDB database, e.g. using MongoDB Compass, you will realize that it already looks a lot like the domain object you wanted. The actual task to map objects to domains, or subset of domains, is now mostly driven by the application use case. Tip: How to leverage application mapping to accelerate onboarding In the model below, which was taken from the financial industry but can easily be adopted across industries, we identify the data domains in various applications and map their behavior to the effort it takes to locate them as well as their importance to the app. First, each domain gets a rating for its object complexity, where “complexity” is defined by the implementation team. This is similar to the concept of “ poker ” in a development sprint. Second, each data domain must be located in the application content. Then, it’s tally time. As we can see in the example above, the concept of schedules looks quite easy but is superseded by the client profiles which have a touch more application context (spoiler: those always come out on top). Based on the combination of complexity and the number of data domains affecting an application, we can now easily achieve the model below. Agile is your friend and, assuming a certain “point capacity,” the applications fall into place for their conversion schedule in a quite neutral fashion. The development team will then start with low hanging fruit. As soon as application 1, 6 and 7 are ported, we’re in business in a new modern landscape. Along the journey, the domains will get cleaned up naturally as we do not have the static corsage of the RDBMS table designs. Step 2: Scaffolding Scaffolding is the art of building a bridge that can hold people as they cross it, then immediately dissipate once they step off. But for that critical time, it needs to hold. The same is true for the connectivity between a legacy system and a new data platform. Starting with the first sprint, we have data residing in the MongoDB data platform. If the data is limited to new applications and resides exclusively in MongoDB, nothing needs to be done. However, as shown in the client profiles example above, there may be dependencies to consider. The synchronization between the legacy database and the new MongoDB platform can be easily arranged using microservices and the same concepts used for the initial loading of data. Synchronization can also be achieved through “the gate” if only READ data is needed during the first sprint, or if you’re already dealing with WRITE and the requirement to synchronize those writes back to a legacy system. Streaming: A streaming based solution is a great option for uni-directional operations that allow read only in the most simple way. Service: Selecting a simple, tiny microservice is a good option for the use case where data needs to be selectively written. It works using the document model on the MongoDB side, but can still push necessary updates back to the legacy system, and vice-versa. The great news is that this service potentially exists already, as it requires nothing more than using the old database interface from the legacy application on one side and the new, easy-to-digest JSON document format on the MongoDB side. If both databases are ACID-compliant, any transaction is automatically treated as a normal application interaction on both sides. “Y-Loader”: Another option is a true “Y-loader,” where all transactions are written in sync to both databases in parallel, and the actual transaction is only considered committed when both systems report their commit and completion. Simple two-phase protocols (write to both, wait five seconds, read both to validate and, if in sync, commit to application) are available as ready-made services through various distributed transaction coordinators, but often it’s easier to use the existing data access in the application. In that case, the new data path to MongoDB is in parallel, and a simple redundant checkpoint (which the application logic would have had for the legacy path anyway) is expanded for this purpose. Step 3: Coding The coding with the new domain data model, as well as the MongoDB flexible document model as the underlying base, will immediately impact the coding for the business logic and application development. The operative word is immediately. As the data gets unlocked with the initial persistence of the code object to the MongoDB collection, the developer is simultaneously able to code based on business requirements. Developers will no longer be hindered by reference and requirements of object mappers. As the objects are represented through the MongoDB idiomatic drivers, each programming object resides directly in the data collection; in reverse, any changes to the business logic object will be naturally represented—code-free—in the MongoDB collection. A single blog post can't resolve all open questions and edge cases. Each application, client, and data interface is unique. Databases possess historic technical debt and implicit assumptions that become lost in generations of developers over time. “Do not touch this section—not sure what it does but last time we tried all hell broke loose…” is often-heard advice around the organizational water cooler. But the key lesson? There are many different templates available and very simple methods of quickly taking the lead to significant success. For example, a German client, who was stuck in a combination of IBM DB2 (mainframe and distributed) with a significant Hadoop footprint, was amazed when they realized they could “lift” their data one microservice at a time. This resulted in business requirements shifting from “impossible to do” for some requested queries to “completed in under one second” within a single week of a proof-of-concept. This is no exception. Cases and changes like these are made daily, reinforcing Mark Twain’s sage advice that “The secret of getting ahead is getting started." Step 4: Innovation As the migration from the legacy environment continues, innovation will be the new focus. The unlocking of previously siloed data allows immediate coupling of real-time data with machine learning platforms for various purposes: e.g. scoring for financial decision-making, personalization for retail, or optimization of production processes in the IOT context. New applications and solutions can easily be created on top of the unleashed data, even with various programming languages, direct real-time dashboards created with MongoDB Charts, and different paradigms (again, MongoDB’s idiomatic drivers do magic!) At this time, the discussion with the product owners in your squads and tribes (trying to be real modern here) begins with the question“What is the highest priority component to change?” and “What function is required to enable this change?” Is it worth waiting much longer? The real question is: why did we all not start sooner? It’s time to begin integrating the list of features you always dreamed of having, but never dared to pursue. The MongoDB team is here to help you get started. Reach out today and let’s discuss the best path forward. To learn more about modernizing to MongoDB, click here .

January 27, 2021

Legacy Modernization with MongoDB and Confluent

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more . In many organizations, crucial enterprise data is locked in dozens or hundreds of silos that may be, controlled by different teams, and stuck in systems that aren’t able to serve new workloads or access patterns. This is a blocker for innovation and insight ultimately hampering the business. For example, imagine building a new mobile app for your customers that enables them to view their account data in a single view. Designing the app could require months of time to simply navigate the internal processes necessary to gain access to the legacy systems and even more time to figure out how to integrate them. An Operational Data Layer, or ODL, can offer a “best of both worlds” approach, providing the benefits of modernization without the risk of a full rip and replace. Legacy systems are left intact – at least at first – meaning that existing applications can continue to work as usual without interruption. New or improved data consumers will access the ODL rather than the legacy data stores, protecting those stores from new workloads that may strain their capacity and expose single points of failure. At the same time, building an ODL offers a chance to redesign the application’s data model, allowing for new development and features that aren’t possible with the rigid tabular structure of existing relational systems. With an ODL, it’s possible to combine data from multiple legacy sources into a single repository where new applications, such as a customer single view or artificial intelligence processes, can access the entire corpus of data. Existing workloads can gradually shift to the ODL, delivering value at each step. Eventually, the ODL can be promoted to a system of record and legacy systems can be decommissioned. Read our blog covering DaaS with MongoDB and Confluent to learn more. There’s also a push today for applications and databases to be entirely cloud-based, but the reality is that current business applications are often too complex to be migrated easily or completely. Instead, many businesses are opting to move application data between on-premises and cloud deployments in an effort to leverage the full advantage of public cloud computing without having to undertake a complete, massive data lift-and-shift. Confluent can be used for both one-time and real-time data synchronization between legacy data sources and modern data platforms like MongoDB, whose fully managed global cloud database service, MongoDB Atlas , is supported across AWS, Google Cloud, and Azure. Confluent Platform can be self-managed in your own data center while Confluent Cloud can be used on the public clouds. Whether leaving your application on-premise is a personal choice or a corporate mandate, there are many good reasons to integrate with MongoDB Atlas. Bring your data closer to your users in more than 70 regions with Atlas’s global clusters Address your most intense workloads with one-click, automated sharding for scale out and zero-downtime scale up Quickly provision TBs of database storage, all on high performance SSDs with dedicated I/O bandwidth Natively query and analyze data across AWS S3 and MongoDB Atlas with MongoDB Atlas Data Lake Perform full-text search queries with MongoDB Atlas Search Build native mobile applications that seamlessly synchronize data with MongoDB Realm Create powerful visualizations and dashboards of your MongoDB data with MongoDB Charts Off-load older data to cost effective storage with MongoDB Atlas Online Archive In this video we will show one time migration and Real time continuous data synchronization from a Relational System to MongoDB Atlas using Confluent Platform and the MongoDB Connector for Apache Kafka . Also we will be talking about different ways to store and consume the data within MongoDB Atlas. Git repository for the demo is here . Learn more about the MongoDB and Confluent partnership here and download the joint Reference Architecture here . Click here to learn more about modernizing to MongoDB.

January 7, 2021

Part 1: The Modernization Journey with Exafluence and MongoDB

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more . Welcome to the first in a series of conversations between Exafluence and MongoDB about how our partnership can use open source tools and the application of data, artificial intelligence/machine learning and neuro-linguistic programming to power your business’s digital transformation. In this installment, MongoDB Senior Partner Solutions Architect Paresh Saraf and Director for WW Partner Presales Prasad Pillalamarri sit down with Exafluence CEO Ravikiran Dharmavaram and exf Insights Co-Founder Richard Robins to discuss how to start the journey to build resilient, agile, and quick-to-market applications.   From Prasad Pillalamari: I first met Richard Robins, MD & Co-Founder of exf Insights at Exafluence back in June 2016 at a MongoDB world event. Their approach towards building data-driven applications was fascinating for me. Since then Exafluence has grown by leaps and bounds in the System Integration space and MongoDB has outperformed its peers in the database market. So Paresh and I decided to interview Richard to deep-dive into their perspective on Modernization with MongoDB. Prasad & Paresh: We first met the Exafluence team in 2016. Since then, MongoDB has created the Atlas cloud data platform that now supports multi-cloud clusters and Exafluence has executed multiple projects on mainframe and legacy modernization. Could you share your perspective on the growth aspects and synergies of both companies from a modernization point of view? Richard Robins: Paresh and Prasad, I’m delighted to share our views with you. We’ve always focused on what happens after you successfully offload read traffic from mainframes and legacy RDBMS to the cloud. That’s digital transformation and legacy app modernization. Early on, Exafluence made a bet that if the development community embraces something we should, too. That’s how we locked in on MongoDB when we formed our company. Having earned our stripes in the legacy data world, we knew that getting clients to MongoDB would mean mining the often poorly documented IP contained in the legacy code. That code is often where long-retired subject matter expert (SME) knowledge resides. To capture it, we built tools to scan COBOL/DB2 and stored procedures to reverse engineer the current state. This helps us move clients to a modern cloud native application, and it's an effective way to merge, migrate, and retire the legacy data stores all of our clients contend with. Once we’d mined the IP with those tools we needed to provide forward-engineered transformation rules to reach the new MongoDB Atlas endpoint. Using a metadata driven approach, we built a rules catalog that included a full audit and REST API to keep data governance programs and catalogs up to date as an additional benefit of our modernization efforts. We’ve curated these tools as exf Insights , and we bring them to each modernization project. Essentially, we applied NLP, ML, and AI to data transformation to improve modernization analysts’ efficiency, and added a low-to-no code transformation rule builder, complete with version control and rollback capabilities. All this has resulted in our clients getting world-class, resilient capabilities at a lower cost in less time. We’re delighted to say that our modernization projects have been successful by following simple tenets — to embrace what the development community embraces and to offer as much help as possible — embodied in the accelerator tools we’ve built. That’s why we are so confident we'll continue our rapid growth. P&P: How do you think re-architecting legacy applications with MongoDB as the core data layer will add value to your business? RR: We believe that MongoDB Atlas will continue to be the developers go-to document database, and that we’ll see our business grow 200-300% over the next three years. With MongoDB Atlas and Realm we can provide clients with resilient, agile applications that scale, are easily upgraded, and are able to run on any cloud as well as the popular mobile iOS and Android devices. Digital transformation is key to remaining competitive and being agile going forward. With MongoDB Atlas, we can give our clients the same capabilities we all take for granted on our mobile apps: they’re resilient, easy to upgrade, usually real-time, scale via Kubernetes clusters, and can be rolled back quickly if necessary. Most importantly, they save our clients money and can be automatically deployed. P&P: At a high level, how will Exafluence help customers take this journey? RR: We’re unusual as a services firm in that we spend 20% of gross revenue on R&D, so our platform and approach are proven. Thus, relatively small teams for our healthcare, financial services, and industrial 4.0 clients can leverage our approach, platform, and tools to deliver advanced analytical systems that combine structured and unstructured data across multiple domains. We built our exf Insights accelerator platform using MongoDB and designed it for interoperability, too. On projects we often encounter legacy ETL and messaging tools. To show how easy it is, we recently integrated exf Insights with SAP HANA and the SAP Data Intelligence platform. Further, we can publish JSON code blocks and provide Python code for integration into ETL platforms like Informatica and Talend. Our approach is to reverse engineer by mining IP from legacy data estates and then forward engineer the target data estate, using these steps and tools: Reverse Engineer Extract stored procedures, business logic, and technical data from the legacy estate and load it into our platform. Use our AI/ML/NLP algorithms to analyse business transformation logic and metadata, with outliers identified for cleansing. Provide DB scans to assess legacy data quality to cleanse and correct outliers, and provide tools to compare DB level data reconciliations. Forward Engineer To produce a clean set of metadata and business transformation logic, and baseline with version control, we: Extract, transform, and load metadata to the target state. Score metadata via NLP and ML to recommend matches to the Analyst who accepts/rejects or overrides recommendations. Analysts can then add additional transformations which are catalogued. Deploy and load cleansed data to the target state platform so any transformations and gold copies may be built. Automate Data Governance via Rest API, Code Block generation (Python/JSON) to provide enterprise catalogs with the latest transforms. P&P: What are your keys to a successful transformation journey? RR: Over the past several years we’ve identified these elements and observations: Subject matter experts and technologists must work together to provide new solutions. There’s a shortage of skilled technologists able to write, deploy, and securely manage next generation solutions. Using accelerators and transferring skills are vital to mitigating the skills shortage. Existing IP that’s buried in legacy applications must be understood and mined in order for a modernization program to succeed. A data-driven approach that combines reverse and forward engineering speeds migration and also provides new data governance and data science catalog capabilities. The building, caring, and feeding of new, open source-enabled applications is markedly different from the way monolithic legacy applications were built. The document model enables analytics and interoperability. Cybersecurity and data consumption patterns must be articulated and be part of the process, not afterthoughts. Even with aggressive transformation plans, new technology must co-exist with legacy applications for some time; progress works best if it’s not a big bang. Success requires business and technology to learn new ways to provide, acquire, and build agile solutions. P&P: Can you talk about solutions you have which will accelerate the modernization journey for the customers? RR: exf Insights helps our clients visualize what’s possible with extensive, pre-built, modular solutions for health care, financial services, and industrial 4.0. They show the power of MongoDB Atlas and also the power of speed layers using Spark and Confluent Kafka. These solutions are readily adaptable to client requirements and reduce the risk and time required to provide secure, production-ready applications. Source data loading. Analyze and integrate raw structured and unstructured data, including support for reference and transactional data. Metadata scan. Match data using AI/NLP, scoring results and providing side-by-side comparison. Source alignment. Use ML to check underlying data and score results for analysts, and leverage that learning to accelerate future changes. Codeless transformation. Empower data SMEs to build the logic with a multiple-sources-to-target approach and transform rules which support code value lookups and complex Boolean logic. Includes versioned gold copies of any data type (e.g., reference, transaction, client, product, etc.). Deployment. Deploy for scheduled or event-driven repeatability and dynamically populate Snowflake or other repositories. Generates code blocks that are usable in your estate or REST API. We used the same 5-step workflow data scientists use when we enabled business analysts to accelerate the retirement of internal data stores to build and deploy the COVID-19 self-checking app in three weeks, including active directory integration and downloadable apps. We will be offering a Realm COVID-19 screening app on web, Android, and IOS to the entire MongoDB Atlas community in addition to our own clients. The accelerator integrates key data governance tools, including exf Insights repository management of all sources and targets with versioned lineage; as-built transformation rules for internal and client implementations; and a business glossary integrated into metadata repositories. P&P: Usually one of the key challenges for businesses is data being locked in silos. RR: We couldn’t agree more. Our data modernization projects routinely integrate with source transactional systems that were never built to work together. We provide scanning tools to understand disparate data as well as ways to ingest, align, and stitch them together. Using health care as an example, exf Insights provides a comprehensive analytical capability, able to integrate data from hospitals, claims, pharmaceutical companies, patients, and providers. Some of this is NonSQL, such as radiological images; for pharma companies we provide capabilities to support clinical research organizations (CROs) via a follow-the-molecule approach. Of course, we also have to work with and subscribe to Centers for Medicare & Medicaid Services (CMS) guidelines. Our data migration focuses on collecting the IP behind the data and making the source, logic, and any transformations rules available to our clients. In financial services, it’s critical to understand source and targets. No matter how data is accessed (federated or direct store), with Spark and Kafka we can talk to just about any data repository. P&P: Once we discover the data to be migrated, we need to model the data according to MongoDB’s data model paradigm. That requires multiple transformations before data is loaded to MongoDB. Can you explain more about how your accelerators help here? RR: By understanding data consumption and then looking at existing data structures, we seek to simplify and then apply the capabilities of MongoDB’s document model. It’s not unlike what a data architect would do in the relational world, but with MongoDB Atlas it’s easier. We ourselves use MongoDB for our exf Insights platform to align, transform, and make data ready for consumption in new applications. We’re able to provide full rules lineage and audit trail, and even support rollback. For the real-time speed layer we use Spark and Kafka as well. This data-driven modernization approach also turns data governance into an active consumer of the rules catalog, so exf Insights works well for regulated industries. P&P: It’s great that we have data migrated now. Consider a scenario where it’s a mainframe application and we have lots of COBOL code in there. It has to be moved to a new programming language like Python, with a change in the data access layer to point to MongoDB. Do you have accelerators which can facilitate the application migration? If so, how? RR: Yes, we do have accelerators that understand the COBOL syntax to create JSON and ultimately Java, which speeds modernization. We also found we had to reverse engineer stored procedures as part of our client engagements for Exadata migration. P&P: Once we migrate the data from legacy databases to MongoDB, validation is the key step. As this is a heterogeneous migration it can be challenging. How can Exafluence add value here? RR: We’ve built custom accelerators that migrate data from the RDBMS world to MongoDB, and offer data comparisons as clients go from development to testing to production, documenting all data transformations along the way. P&P: Now that we’ve talked about all your tools which can help in the modernization journey, can you tell us about how you already helped your customers to achieve this? RR: Certainly. We’ve already outlined how we’ve created solution starters for modernization, with sample solutions as accelerators. But that’s not enough; our key tenet for successful modernization projects is pairing SMEs and developers. That’s what enables our joint client and Exafluence teams to understand the business, key regulations, and technical standards. Our data-driven focus lets us understand the data regardless of industry vertical. We’ve successfully used exf Insights now in financial services, healthcare, and industry 4.0. Whether it’s understanding the nuances of financial instruments and data sources for reference and transactional data, or Medical Device IoT sensors in healthcare, or shop floor IoT and PLC data for predictive analytics and digital twin modeling, a data-driven approach reduces modernization risks and costs. Below are some of the possibilities this data-driven approach has delivered for our healthcare clients using MongoDB Atlas. By aggregating provider, membership, claims, pharma, and EHR clinical data, we offer robust reporting that: Transforms health care data from its raw form into actionable insights that improve member care quality, health outcomes, and satisfaction Provides FHIR support Surfaces trends and patterns in claims, membership, and provider data Lets users access, visualize, and analyze data from different sources Tracks provider performance and identifies operational inefficiencies P&P: Thank you, Richard! Keep an eye out for upcoming conversations in our series with Exafluence, where we'll be talking about agility in infrastructure and data as well as interoperability. MongoDB and Modernization To learn more about MongoDB's overall Modernization strategy, read here .

December 9, 2020

Simplifying Data Science with Iguazio and MongoDB: Modernization with Machine Learning

For the most innovative, forward-thinking companies, “data” has become synonymous with “big data” — and “big data” has become synonymous with “machine learning and AI.” The amount of data you have is raw knowledge. The ability to connect the dots in a cohesive picture that lets you see major projections, personalizations, security breaches, etc., in real time — that’s wisdom. Or, as we like to call it, data science. MongoDB Cloud is the leading cloud data platform for modern applications. Iguazio , initially inspired by the powerful Iguazu Falls in South America, is the leading data science platform built for production and real-time use cases. We’re both disrupting and leading various industries through innovation and highly forethought intelligence. It makes perfect sense for us to work together to create a powerful, data-driven solution. Iguazio Data Science & MLOps platform optimizes data science for your use cases Iguazio enables enterprises to develop, deploy, and manage their AI applications, drastically shortening the time required to create real business value with AI. Using Iguazio, organizations can build and run AI models at scale and in real time, deploy them anywhere (multi-cloud, on-prem or edge), and bring to life their most ambitious AI-driven strategies. Enterprises spanning a wide range of verticals use Iguazio to solve the complexities of machine learning operations ( MLOps ), and accelerate the machine learning workflow by automating the following, end-to-end processes: Data collection — ingested from any diverse source, whether structured, unstructured, raw or real-time Data preparation — through exploration and manipulation at scale (go big!) Continuous model training — through acceleration and automation Rapid model and API deployment Monitoring and management of AI applications in production As a serverless, cloud-native data science platform, Iguazio reduces the overhead and complexity of developing, deploying, and monitoring AI models, guarantees consistent and reproducible results, and allows mobilizing, scaling, and duplicating functions to multiple enforcement points. MongoDB delivers unprecedented flexibility for real-time data science integration With its scalable, adaptable data processing model, ability to build rich data pipelines, and capacity to scale out while doing both in parallel, MongoDB is a foundational persistence layer for data science. It allows you to use your data intelligently in complex analytics, drawing new conclusions and identifying actions to take. Data science and data analytics go hand-in-hand, fueled by big data. The MongoDB data platform handles data analytics by: Enabling scalability and distributed processing — processing data with a query framework that reduces data movement by knowing where that data is and optimizing in-place computation Accelerating insights — delivering real-time insight and actions Supporting a full data lifecycle — intelligent data tiering from ingestion to transactions to retirement Leveraging a rich ecosystem of tools and machine learning for data science Here's a look at how Iguazio and MongoDB partner to synthesize a seamless production environment: MongoDB and Iguazio: from research to production in weeks Iguazio fuses with MongoDB to allow intelligent, complex data compilations that lead to real-world ML/AI results like streaming and analytics, IoT applications, conversational interfaces, and image recognition. Data science is opening opportunities for businesses in all areas, from financial services to retail, marketing, telco, and IoT, and those opportunities create demands on data that continue to grow. Iguazio swiftly reduces the development and deployment of data science projects from months to weeks, transforming how businesses, developers, and product owners use and imagine new use cases for their data. Together, MongoDB and Iguazio establish a joint hybrid/multi-cloud data science platform. MongoDB’s unique features create the perfect seeding ground for Iguazio’s data science platform. They include: MongoDB’s high-performing, highly ranked data platform experience No data duplication Optimization for real-time, an essential factor for data science An elastic, flexible model that adjusts to ever-changing load requirements Production that’s ready in minutes Meanwhile, Iguazio’s powerful ML pipeline automation simplifies the complex data science layer by creating a production-ready environment with an end-to-end MLOps solution, including: A feature store for managing features (online and offline) that resides in MongoDB Data exploration and training at scale, using built-in distribution engines such as Dask and Horovod Real-time feature engineering using Iguazio’s Nuclio-supported serverless functions Model management and model monitoring, including drift detection Open and integrated Python environment, including built-in libraries and Jupyter Notebook-as-a-Service Data and data science in the real world When we think of data, stagnant databases may come to mind. But data in action is live, quick, and moves in real time. Data science is no different — and it has quickly incorporated itself in every sector of virtually every industry: Fraud prevention — distinguishing legitimate from fraudulent behavior and learning to prevent new tactics over time Predictive maintenance — finding patterns to predict and prevent failures Real-time recommendation engines — processing consumer data for immediate feedback Process optimization — minimizing costs and improving processes and targets Remote monitoring — quickly detecting anomalies, threats, or failures Autonomous vehicles — continuously learning new processes and landscapes to optimize safety, performance, and maintenance Smart scheduling — increasing coordination among nearly infinite variables Smart mobility systems — using predictive optimization to maintain efficiency, safety, and accuracy IoT & IIoT — generating insights to identify patterns and predict behavior Data science today MongoDB enables a more intuitive process for data management and exploration by simplifying and enriching data. Iguazio helps turn data into smarter insights by simplifying organizations’ modernization into machine learning, AI, and the ongoing spectrum of data science — and we’ve only just scratched the surface. To learn more about how, together, Iguazio and MongoDB can transform your data processes into intelligent data science, check out our joint webinar discussing multiple client use cases. MongoDB and modernization To learn more about MongoDB’s overall modernization strategy for moving from legacy RDBMS to MongoDB Atlas, read here .

December 2, 2020