Find Hidden Insights in Vector Databases: Semantic Clustering

Mai Nguyen and Scott Kurowski
August 19, 2024 | Updated: March 6, 2025
#genAI #Vector Search

Vector databases, a powerful class of databases designed to optimize the storage, processing, and retrieval of large volume, multi-dimensional data, have increasingly been instrumental to generative AI (gen AI) applications, with Forrester predicted a 200% increase in the adoption of vector databases in 2024. But their power extends far beyond these applications. Semantic vector clustering, a technique within vector databases, can unlock hidden knowledge within your organization’s data, democratizing insights across teams. View the tutorial to get started.

Mining diverse data for hidden knowledge

Imagine your organization’s data as a library of diverse knowledge—a treasure trove of information waiting to be unearthed. Traditionally, uncovering valuable insights from data often relied on asking the right questions, which can be a challenge for developers, data scientists, and business leaders alike. They might spend vast amounts of time sifting through limited, siloed datasets, potentially missing hidden gems buried within the organization's vast data troves. Simply put, without knowing the right questions to ask, these valuable insights often remain undiscovered, leading to missed opportunities or losses.

Enter vector databases and semantic vector clustering. A vector database is designed to store and manage unstructured data efficiently. Within a vector database, semantic vector clustering is a technique for organizing information by grouping vectors with similar meaning together. Text analysis, sentiment analysis, knowledge classification, and uncovering semantic connections between data sets—these are just a few examples of how semantic vector clustering empowers organizations to vastly improve data mining.

Semantic vector clustering offers a multifaceted approach to organizational improvement. By analyzing text data, it can illuminate customer and employee sentiments, behaviors, and preferences, informing strategic decisions, enhancing customer service, and optimizing employee satisfaction. Furthermore, it revolutionizes knowledge management by categorizing information into easily accessible clusters, thereby boosting collaboration and efficiency. Finally, by bridging data silos and uncovering hidden relationships, semantic vector clustering facilitates informed decision-making and breaks down organizational barriers.

For example, the business can gain significant insights from its customer interaction data which is routinely kept, classified, or summarized. Those data points (texts, numbers, images, videos, etc.) can be vectorized and semantic vector clustering applied to identify the most prominent customer patterns (the densest vector clusters) from those interactions, classifications, or summaries. From the identified patterns, the business can take further actions or make more informed decisions that they wouldn’t have been able to do otherwise.

The power of semantic vector clustering

So, how does semantic vector clustering achieve all this?

Discover semantic structures: Clustering groups similar LLM-embedded vector sets together. This allows for fast retrieval of themes. Beyond clustering regular vectors (individual data points or concepts), clustering RAG vectors (summarization of themes and concepts) can provide superior LLM contexts compared to basic semantic search.
Reduce data complexity via clustering: Data points are grouped based on overall similarity, effectively reducing the complexity of the data. This reveals patterns and summarizes key features, making it easier to grasp the bigger picture. Imagine organizing the library by theme or genre, making it easier to navigate vast amounts of information.
Semantic auto-aggregation: Here is the coolest part. We can classify groups of vectors into hierarchies by effectively semantically "auto-aggregating" them. This means that the data itself “figures out” these groups and "self-organizes." Imagine a library with an efficient automated catalog system, allowing researchers to find what they need quickly and easily. Vector clustering can be used to create hierarchies, essentially "auto-aggregating" groups of vectors semantically. Think of it as automatically organizing sections of the library based on thematic connections without a set of pre-built questions. This allows you to identify patterns within a vast, semantically-diverse data within your organization.

Unlock hidden insights in your vector database

The semantic clustering of vector embeddings is a powerful tool to go beyond the surface of data and identify meanings that otherwise would not have been discovered. By unlocking hidden relationships and patterns, you can extract valuable insights that drive better decision-making, enhance customer experiences, and improve overall business efficiency—all enabled through MongoDB’ secure, unified, and fully-managed vector database capabilities.

Check out our tutorial to learn how to get started.

Head over to our quick-start guide to get started with Atlas Vector Search today.

Add vector search to your arsenal for more accurate and cost-efficient RAG applications by enrolling in the MongoDB and DeepLearning.AI course "Prompt Compression and Query Optimization" for free today.

← Previous

Built With MongoDB: Atlas Helps Team-GPT Launch in Two Weeks

Team-GPT enables teams large and small to collaborate on AI projects. When OpenAI released GPT-4, it turned out to be a game-changer for the startup. Founded in 2023, the company has been helping people train machine learning (ML) models, in particular natural language processing (NLP) models. But when OpenAI launched GPT-4 in March 2023, the team was blown away by how much progress had been made on large language models (LLMs). So Team-GPT dropped everything they were doing and started experimenting with it. Many of those early ideas are still memorialized on a whiteboard in one of the office's meeting rooms: The birth of an idea. Like many startups, Team-GPT began with a brainstorm on a whiteboard. Evolving the application Of all the ideas they batted around, there was one issue in particular the team wanted to solve—the need for a shared workspace where they could experiment with LLMs together. What they found was that having to work with LLMs in the terminal was a major point of friction. Plus, there weren't any sharing abilities. So they set out to create a UI consisting of chat sharing, in-chat team collaboration, folders and subfolders, and a prompt library. The whole thing came together in an incredibly short period of time. This was due, in large part, to their initial choice of MongoDB Atlas, which allowed them to build with speed and scalability. "MongoDB made it possible for us to launch in just two weeks," said Team-GPT Founder and CTO, Ilko Kacharov. "With the MongoDB Atlas cloud platform, we were able to move rapidly, focusing our efforts on developing innovative product features rather than dealing with the complexities of infrastructure management." Before long, the team realized there was a lot more that could be built around LLMs than simply chat, and set out to add more advanced capabilities. Today, users can integrate any LLM of their choice and add custom instructions. The platform also supports multimodality like ChatGPT Vision and DALL-E. Users use any GPT model to turn chat responses into a standalone document that can then be edited. All these improvements are meant to unify teams' AI workflows in a single, AI-powered tool. A platform built for developers Diving deeper into more technical aspects of the solution, Team-GPT CEO Iliya Valchanov acknowledges the virtues of the document data model, which underpins the Atlas modern database. "We wanted the ability to quickly update and create new collections, add more data, and expand the existing database setup without major hurdles or time consumption," he said. "That's something that relational databases often struggle with." A modern database consists of integrated data infrastructure components and services for quick deployment. With transactional, analytical, search, and stream processing capabilities, it supports various use cases, reduces complexity, and accelerates development. Valchanov's team leverages a few key elements of the platform to address a range of application needs. "We benefited from Atlas Triggers , which allow automatic execution of specified database operations," he said. "This greatly simplified many of our routine tasks." It's not easy to build truly differentiated applications without a friction-free developer experience. Valchanov cites Atlas' user-friendly UI as a key advantage for a startup where time is of the essence. And he said that Atlas Charts has been instrumental for the team, who use it every day, even their less technical people. Of course one of the biggest reasons why developers and tech leaders choose MongoDB, and why so many are moving away from relational databases, is its ability to scale—which Valchanov said is one of the most critical requirements for supporting the company's growth. "With MongoDB handling the scaling aspect, we were able to focus our attention entirely on building the best possible features for our customers." Team-GPT deployment options Accelerating AI transformation Team-GPT is a collaborative platform that allows teams of up to 20,000 people to use AI in their work. It's designed to help teams learn, collaborate, and master AI in a shared workspace. The platform is used by over 2,000 high-performing businesses worldwide, including EY, Charles Schwab, Johns Hopkins University, Yale University, and Columbia University, all of which are also MongoDB customers. The company's goal is to empower every person who works on a computer to use AI in a productive and safe manner. Valchanov fully appreciates the rapid change that accompanies a product's explosive growth. "We never imagined that we would eventually grow to provide our service to over 40,000 users," he said. "As a startup, our primary focus when selecting a data platform was flexibility and the speed of iteration. As we transitioned from a small-scale tool to a product used by tens of thousands, MongoDB's attributes like flexibility, agility, and scalability became necessary for us." Another key enabler of Team-GPT's explosive growth has been the MongoDB for Startups program . It offers valuable resources such as free Atlas credits, technical guidance, co-marketing opportunities, and access to a network of partners. Valchanov makes no secret of how instrumental the program has been for his company's success. "The startup program made it free! It offered us enough credits to build out the MVP and cater to all our needs," he said. "Beyond financial aid, the program opened doors for us to learn and network. For instance, my co-founder, Yavor Belakov, and I participated in a MongoDB hackathon in MongoDB's office in San Francisco." Team-GPT co-founders Yavor Belakov (l) and Iliya Valchanov (r) participated in a MongoDB hackathon at the San Francisco office Professional services engagements are an essential part of the program, especially for early-stage startups. "The program offered technical sessions and consultations with MongoDB staff, which enriched our knowledge and understanding, especially for Atlas Vector Search , aiding our growth as a startup," said Valchanov. The roadmap ahead for the company includes the release of Team-GPT 2.0, which will introduce a brand-new user interface and new, robust functionalities. The company encourages anyone looking to learn more or join their efforts to ease adoption of AI innovations to reach out on LinkedIn . Are you part of a startup and interested in joining the MongoDB for Startups program? Apply to the program now . For more startup content, check out our Built With MongoDB blog collection.

August 15, 2024

Next →

Building Gen AI with MongoDB & AI Partners | February 2025

February was big for MongoDB—and, more importantly, for anyone looking to build AI applications that deliver highly accurate, relevant information (in other words, for everyone building AI apps). MongoDB announced the acquisition of Voyage AI , a pioneer in state-of-the-art embedding and reranking models that power next-generation AI applications. Because generative AI is by nature probabilistic, models can “hallucinate”, and generate false or misleading information. This can lead to serious risks, especially in cases or industries (e.g., financial services) where accurate information is paramount. To address this, organizations building AI apps need high-quality retrieval; they need to trust that the most relevant information is extracted from their data with precision. Voyage AI’s advanced embedding and reranking models enable applications to extract meaning from highly specialized and domain-specific text and unstructured data. With roots at Stanford and MIT, Voyage AI’s world-class team is trusted by AI innovators like Anthropic, LangChain, Harvey, and Replit. Integrating Voyage AI’s technology with MongoDB will enable organizations to easily build trustworthy, AI-powered applications by offering highly accurate and relevant information retrieval deeply integrated with operational data. For more, check out MongoDB CEO Dev Ittycheria’s blog post about Voyage AI , and what this means for developers and businesses (in short, delivering high-quality results at scale). Onward! P.S. If you’re in Vegas for HumanX this week, stop by booth 412 to say hi to MongoDB! Welcoming new AI and tech partners The Voyage AI news was hardly the only exciting development last month. In February 2025, MongoDB welcomed three new AI and tech partners that offer product integrations with MongoDB. Read on to learn more about each great new partner! CopilotKit Seattle-based CopilotKit provides open source infrastructure for in-app AI copilots. CopilotKit helps organizations build production-ready copilots and agents effortlessly. “We’re excited to be partnering with MongoDB to help companies build best-in-class copilots that leverage RAG & take action based on internal data,” said Uli Barkai, Co-Founder and Chief Marketing Officer at CopilotKit. “MongoDB made it dead simple to build a scalable vector database with operational data. This collaboration enables developers to easily ship production-grade RAG applications.” Varonis Varonis is the leader in data security, protecting data wherever it lives—across SaaS, IaaS, and hybrid cloud environments. Varonis’ cloud-native Data Security Platform continuously discovers and classifies critical data, removes exposures, and detects advanced threats with AI-powered automation. “Varonis’s mission is to protect data wherever it lives,” said David Bass, Executive Vice President of Engineering and Chief Technology Officer at Varonis. “We are thrilled to further advance our mission by offering AI-powered data security and compliance for MongoDB, the database of choice for high-performance application and AI development. With this integration, joint customers can automatically discover and classify sensitive data, detect abnormal activities, secure AI data pipelines, and prevent data leaks.” Xlrt Xlrt is an automated insight-generation platform that enables financial institutions to create innovative financial credit products at scale by simplifying the financial spreading process. “We are excited to partner with MongoDB Atlas to transform AI-driven financial workflows,” said Rupesh Chaudhuri, Chief Operating Officer and Co-Founder of Xlrt. “XLRT.ai leverages agentic AI, combining graph-based contextualization, vector search, and LLMs to redefine data-driven decision-making. With MongoDB's robust NoSQL and vector search capabilities, we’re delivering unparalleled efficiency, accuracy, and scalability in automating financial processes.” To learn more about building AI-powered apps with MongoDB, check out our AI Learning Hub and stop by our Partner Ecosystem Catalog to read about our integrations with MongoDB’s ever-evolving AI partner ecosystem. And visit the MongoDB AI Applications Program (MAAP) page to learn how MongoDB and the MAAP ecosystem helps organizations build applications with advanced AI capabilities.

March 12, 2025