Building AI with MongoDB: How VISO TRUST is Transforming Cyber Risk Intelligence

Mat Keep and Elliott Gluck
September 5, 2023 | Updated: August 8, 2024
#genAI #Vector Search

NOTE: We have an updated VISO TRUST story, which goes into the latest and greatest with the VISO TRUST team. You can read the latest customer story here.

Since announcing MongoDB Atlas Vector Search preview availability back in June, we’ve seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're going to talk to one of these customers.

VISO TRUST puts reliable, comprehensive, actionable vendor security information directly in the hands of decision-makers who need to make informed risk assessments. The company uses state-of-the-art models from Amazon Bedrock, augmented by vector search and retrieval from MongoDB Atlas.

We sat down with Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST to learn more.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Tell us a little bit about your company. What are you trying to accomplish and how that benefits your customers or society more broadly?

VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers the fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale for companies at any maturity level.

Our commitment to innovation means that we are constantly looking for ways to optimize business value for our customers. VISO TRUST ensures that complex business-to-business (B2B) transactions adequately protect the confidentiality, integrity, and availability of trusted information. VISO TRUST’s mission is to become the largest global provider of cyber risk intelligence and become the intermediary for business transactions. Through the use of VISO TRUST, customers will reduce their threat surface in B2B transactions with vendors and thereby reduce the overall risk posture and potential security incidents like breaches, malicious injections, and more.

Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption. Because it’s the only approach that can deliver accurate results at scale, for the first time, customers are able to gain complete visibility into their entire third-party populations and take control of their third-party risk.

Describe what your application does and what role AI plays in it

The VISO TRUST Platform approach uses patented, proprietary machine learning and a team of highly qualified third-party risk professionals to automate this process at scale.

Simply put, VISO TRUST automates vendor due diligence and reduces third-party at scale. And security teams can stop chasing vendors, reading documents, or analyzing spreadsheets.

Screenshot of the VISO Trust dashboard displaying risk metrics, such as average residual risk and average control coverage. — Figure 1: VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process

VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session third parties are prompted to upload relevant artifacts of the security program that already exists and our supervised AI – we call Artifact Intelligence – does the rest.

Security artifacts that enter VISO’s Artifact Intelligence pipeline interact with AI/ML in three primary ways. First, VISO deploys discriminator models that produce high-confidence predictions about features of the artifact. For example, one model performs artifact classification, another detects organizations inside the artifact, another predicts which pages are likely to contain security controls, and more. Our modules reference a comprehensive set of over 25 security frameworks and use document heuristics and natural language processing to analyze any written material and extract all relevant control information.

Secondly, artifacts have text content parsed out of them in the form of sentences, paragraphs, headers, table rows, and more; these text blobs are embedded and stored in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs retrieval-augmented generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts.

Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce.

VISO TRUST’s risk model analyzes your risk and delivers a complete assessment that provides everything you need to know to make qualified risk decisions about the relationship. In addition, the platform continuously monitors and reassesses third-party vendors to ensure compliance.

What specific AI/ML techniques, algorithms, or models are utilized in your application?

For our discriminator models, we research the state-of-the-art pre-trained models (typically narrowed by those contained in HuggingFace’s transformers package) and perform fine-tuning of these models using our dataset.

For our dense retrieval system, we use MongoDB Atlas Vector Search which internally uses the Hierarchical Navigable Small Worlds algorithm to retrieve similar embeddings to embedded text content. We have plans to perform a re-ranking of these results as well.

Today we are using Amazon Bedrock exclusively for all GenAI needs. In the past we have used OpenAI and Anthropic and tested Vertex/Bard. We are now exclusively on Bedrock and do not use any other subservicers.

Can you describe other AI technologies used in your application stack?

Some of the other frameworks we use are HuggingFace transformers, evaluate, accelerate, and Datasets, PyTorch, WandB, and Amazon Sagemaker. We have a library for ML experiments (fine-tuning) that is custom-built, a library for workflow orchestration that is custom-built, and all of our prompt engineering is custom-built.

Why did you choose MongoDB as part of your application stack? Which MongoDB features are you using and where are you running MongoDB?

The VISO TRUST Platform relies on effective solutions and tools like MongoDB's distinctive attributes to fulfill specific objectives. MongoDB supports our platform's mechanism to engage third parties efficiently, employing both AI and human oversight to automate the assessment of security artifacts at scale.

The fundamental value proposition of MongoDB – a robust document database – is why we originally chose it. It was originally deployed as a storage/retrieval mechanism for all the factual information our artifact intelligence pipeline produces about artifacts. While it still performs this function today, it has now become our “vector/metadata database.”

MongoDB executes fast ranking of large quantities of embedded text blobs for us while Atlas provides us with all the ease-of-use of a cloud-ready database. We use both the Atlas search index visualization, and the query profiler visualization daily. Even just the basic display of a few documents in collections often saves time. Finally, when we recently backfilled embeddings across one of our MongoDB deployments, Atlas would automatically provision more disk space for large indexes without us needing to be around which was incredibly helpful.

What are the benefits you've achieved by using MongoDB?

I would say there are two primary benefits that have greatly helped us with respect to MongoDB and Atlas. First, MongoDB was already a place where we were storing metadata about artifacts in our system; with the introduction of Atlas Vector Search now we have a comprehensive vector/metadata database – that’s been battle-tested over a decade – that solves our dense retrieval needs. No need to deploy a new database we have to manage and learn. Our vectors and artifact metadata can be stored right next to each other.

Second, Atlas has been helpful in making all the painful parts of database management easy. Creating indexes, provisioning capacity, alerting slow queries, visualizing data, and much more have saved us time and allowed us to focus on more important things.

What are your future plans for new applications and how does MongoDB fit into them?

Retrieval-augmented generation is going to continue to be a first-class feature of our application. In this regard, the evolution of Atlas Vector Search and its ecosystem in MongoDB will be highly relevant to us. MongoDB has become the database our ML team uses, so as our ML footprint expands, our use of MongoDB will expand.

Getting started

Thanks so much to Pierce for sharing details on VISO TRUST’s AI-powered applications and experiences with MongoDB.

The best way to get started with Atlas Vector Search is to head over to the product page or our quick-start guide. There you will find tutorials, documentation, and whitepapers along with the ability to sign up for MongoDB Atlas. You’ll just be a few clicks away from spinning up your own vector search engine where you can experiment with the power of vector embeddings and RAG. We’d love to see what you build, and are eager for any feedback that will make the product even better in the future!

← Previous

The Challenges and Opportunities of Processing Streaming Data

Let’s consider a fictitious bank that has a credit card offering for its customers. Transactional data might land in their database from various sources such as a REST API call from a web application or from a serverless function call made by a cash machine. Regardless of how the data was written to the database, the database performed its job and made the data available for querying by the end-user or application. The mechanics are database-specific but the end goal of all databases is the same. Once data is in a database the bank can query and obtain business value from this data. In the beginning, their architecture worked well, but over time customer usage grew and the bank found it difficult to manage the volume of transactions. The company decides to do what many customers in this scenario do and adopts an event-streaming platform like Apache Kafka to queue these event data. Kafka provides a highly scalable event streaming platform capable of managing large data volumes without putting debilitating pressure on traditional databases. With this new design, the bank could now scale supporting more customers and product offerings. Life was great until some customers started complaining about unrecognized transactions occurring on their cards. Customers were refusing to pay for these and the bank was starting to spend lots of resources figuring out how to manage these fraudulent charges. After all, by the time the data gets written into the database, and the data is batch loaded into the systems that can process the data, the user's credit card was already charged perhaps a few times over. However, hope is not lost. The bank realized that if they could query the transactional event data as it's flowing into the database they might be able to compare it with historical spending data from the user, as well as geolocation information, to make a real-time determination if the transaction was suspicious and warranted further confirmation by the customer. This ability to continuously query the stream of data is what stream processing is all about. From a developer's perspective, building applications that work with streaming data is challenging. They need to consider the following: Different serialization formats: The data that arrives in the stream may contain different serialization formats such as JSON, AVRO, Protobuf or even binary. Different schemas: Data originating from a variety of sources may contain slightly different schemas. Fields like CustomerID could be customerId from one source or CustID in another and a third could not even use the field. Late arriving data: The data itself could arrive late due to network latency issues or being completely out of order. Operational complexity: Developers need to be concerned with reacting to application state changes like failed connections to data sources and how to efficiently scale the application to meet the demands of the business. Security: In larger enterprises, the developer usually doesn’t have access to production data. This makes troubleshooting and building queries from this data difficult. Stream processing can help address these challenges and enable real-time use cases, such as fraud detection, hyper-personalization, and predictive maintenance, that are otherwise difficult or extremely costly to overcome. While many stream processing solutions exist, the flexibility of the document model and the power of the aggregation framework are naturally well suited to help developers with the challenges found with complex event data. Discover MongoDB Atlas Stream Processing Read the MongoDB Atlas Stream Processing announcement and check out Atlas Stream Processing tutorials on the MongoDB Developer Center . Request private preview access to Atlas Stream processing Request access today to participate in the private preview. New to MongoDB? Get started for free today by signing up for MongoDB Atlas .

August 30, 2023

Next →

Empower Financial Services Developers with the Document Model

In financial services , having a proper data modeling strategy is crucial. The volume of data that banks collect and create is continuously growing, partly due to the expansion of digital banking and payments. Financial institutions rely heavily on data to power applications, analyze risks, and make decisions. The way data is modeled can significantly impact the performance, scalability, and accuracy of these systems, as well as the speed of development for new financial products or services. According to Celent, 62% of banks say the “competitive threat from fintechs and other challengers is increasing.” This is no surprise, as the convenience of managing all financial affairs in one centralized app or on a digital-first platform is pulling more and more customers away from traditional financial institutions. In this blog, we will explore why 24% of retail banks ranked “data platforms and management” as one of their top three IT spending priorities for 2024–2025 and how data modeling plays a critical role in this strategic focus. Data modeling and the document model In financial services, data modeling is a critical process that underpins effective data management, enabling institutions to harness the full potential of their data. This process involves identifying relevant financial data and determining how it should be visualized, including its structure, relationships, and management. A well-structured data model leads to better application performance, simpler implementations, and lower costs. It also enhances adaptability and maintainability over time. The principle that data accessed together should be stored together is fundamental for optimizing performance. MongoDB is a document-oriented NoSQL database that offers flexibility and scalability, making it an ideal choice for the industry. It runs with unparalleled reliability, security, and flexibility in a multi-cloud or on-premises environment. Unlike traditional relational databases, MongoDB allows for a more dynamic schema, enabling organizations to adapt to changing business needs without the constraints of rigid table structures. Document databases have the following key features: Document model: Data is stored in documents (unlike other databases that store data in structures like tables or graphs). The documents map to objects in the most popular programming languages, enabling developers to rapidly develop their applications. Flexible schema: MongoDB’s schema-less design enables developers to store data in a way that reflects the real-world relationships between entities, making it easier to manage complex data structures. Data distribution and resilience: Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution. Queryability through an API or query language: Document databases have an API or query language that allows developers to execute the CRUD operations on the database. Developers have the ability to query for documents based on unique identifiers or field values. MongoDB’s document model enables an organization to design document structures that mirror its application’s access patterns. By embedding related data as subdocuments and arrays within a single document, it can make sure that data that is frequently accessed together is stored together. This alignment simplifies the mapping between the application and the database, enhancing both development efficiency and performance. In contrast to relational databases, document databases are better suited to the needs of modern applications because of their ability to store diverse data types (both structured and unstructured) in binary JSON (BSON) format. This flexibility essentially eliminates the middle layer necessary to convert to a SQL-like format, resulting in easier-to-maintain applications, lower development times, and faster responses to changes. As a document’s schema is dynamic and self-describing, developers don’t need to predefine it in the database. They can modify it at any time, avoiding disruptive schema migrations and improving their own productivity and experience. Watch now: Intro to Data Modeling for Financial Services and Insurance explains: The definition of data modeling Relational database management systems vs MongoDB Relationships: Linking vs Embedding Design Patterns MongoDB’s document model for financial services Leading financial institutions across the world are increasingly adopting MongoDB with seven of the top ten banks in the world (per the 2024 Forbes Global 2000) utilizing MongoDB in their application architectures. MongoDB Atlas is an integrated suite of data services centered on a document database designed to accelerate and simplify how users build with data. MongoDB enables banks to employ an iterative approach to banking modernization . This approach preserves legacy components for as long as they’re needed. By deploying MongoDB as an operational data layer (ODL) in a phased approach, banks can embark on their digital transformation journeys iteratively, without the risk of an all-or-nothing, rip-and-replace approach. Once the new architecture is in place, development teams can build new business functionality faster and scale new services to millions of users. Here are some of the ways that MongoDB helps financial institutions meet their industry data challenges: Open finance: MongoDB’s flexible schema, with the ability to handle various data types including structured and unstructured data, seamlessly integrates with modern technologies and frameworks, making it a great fit for orchestrating open API ecosystems. Personalized banking experiences: Retail banks aim to deliver hyper-personalized services, such as tailored financial advice or customized product recommendations. A flexible and accurate document model ensures platforms can integrate data from multiple channels (mobile apps, branches, and ATMs) for a seamless experience. Encryption and access control: MongoDB provides security capabilities like field-level encryption, role-based access control (RBAC), and auditing—key features to enhance data security and facilitate compliance with regulations like Financial Data Access (FiDA). By encrypting sensitive customer data both at rest and in transit, MongoDB helps keep data secure and tamper-proof throughout its lifecycle. Data sovereignty and global deployment: Financial data must adhere to strict geographic regulations, with certain jurisdictions enforcing the requirement that data remain within specific regions. MongoDB Atlas offers multi-cloud and multi-region deployments, enabling financial institutions to store data in specific regions while maintaining compliance with data sovereignty laws. Multidocument ACID transactions: Transactions in MongoDB feel just like transactions developers are familiar with in relational databases. With multi-document atomicity, consistency, isolation, and durability (ACID) transactions, developers can address transactional use cases across multiple documents within the same cluster. Payment system scalability and flexibility: MongoDB’s document database excels in unifying an organization’s data, from backend payment processing to customer interactions, surfacing insights to create a seamless, connected, and personalized customer journey. Payment systems must accommodate fluctuating transaction volumes and evolving business needs. MongoDB Atlas makes scaling as easy as setting the right configuration. It supports both horizontal and vertical scaling. Fraud detection: MongoDB’s document data model incorporates any kind of data—any structure, any format, any source—no matter how often it changes, enabling an organization to create a holistic picture of its customers to better predict transaction anomalies in real-time. MongoDB then enables it to process large amounts of data and analyze it in real-time to identify suspicious activity. Financial data management: MongoDB lets organizations capture and store financial and trade-related data together, respond to compliance and regulatory requests with confidence, and analyze pretrade communications to gain insights and detect errors. AI-driven interactive banking: MongoDB is designed to work seamlessly with leading AI frameworks, enabling banks to integrate and scale AI applications quickly and efficiently. MongoDB’s flexibility supports innovation by providing a scalable, developer-friendly environment that enables businesses to rapidly develop new financial services and products and scale to support millions of users. And, as a document-based database, MongoDB supports the flexible data modeling that is so crucial to the financial services industry. If you have any questions or would like to learn more about MongoDB and data modeling, feel free to check out the following resources: Intro to Data Modeling for Financial Services and Insurance Temenos Banking Cloud Scales to Record High Transactions with MongoDB Atlas and Microsoft Azure Our Solutions Library is where you can learn about different use cases for gen AI and other interesting topics that are applied to financial services and many other industries.

January 22, 2025