Analyze non-standard events drawn from autonomous driving sensor output using multimodal AI, hybrid search, and a conversational agent powered by AWS Bedrock and S3, and backed by MongoDB.
Use cases: Artificial Intelligence, Internet of Things
Industries: Manufacturing & Motion
Products: MongoDB Atlas, MongoDB Search, MongoDB Vector Search, MongoDB Voyage AI,
Partners: Amazon Bedrock
Solution Overview
Autonomous driving systems generate enormous volumes of sensor data: high-resolution images, LiDAR sweeps, radar frames, and telemetry logs. Within this flood of information, the most valuable data points are often the rarest: unusual or unexpected driving scenarios known as edge or corner cases. These data points include animals on the road, flooded intersections, unusual construction zones, and other situations that autonomous systems rarely encounter during standard validation and road testing.
Finding these rare scenarios manually is slow and costly. Data scientists often spend significant time writing custom filters and scripts to locate specific events. Manual search methods are slow and may resolve only a handful of edge cases per year, while teams need to scale that number to thousands.
The Multimodal Event Explorer demonstrates one way to solve this challenge by combining MongoDB Atlas Search with Voyage AI embeddings and a conversational AI agent. The solution enables teams:
To search across driving events using natural language descriptions.
To filter by environmental conditions, such as weather, season, and time of day.
To interact with a ReAct-based AI agent that reasons over the database in real time.
This solution allows engineers and data scientists to discover rare driving scenarios at fleet scale in seconds rather than weeks, accelerating model training cycles and improving the safety and reliability of autonomous driving systems.
Reference Architectures
The solution follows a tiered architecture with a clear separation between the web application, backend services, and application data platform.
Figure 1. High-level architecture of the Multimodal Event Explorer
Web Application
The frontend is built with Next.js and uses LeafyGreen UI components for a MongoDB-branded experience. It provides the following main interaction surfaces:
A search bar with metadata filter dropdowns.
A results grid displaying matched driving event images.
A chat panel for conversational queries through the AI agent.
Backend Services
A FastAPI-based Python backend orchestrates the core logic. It exposes:
A Hybrid Search API that combines vector and full-text search.
A Reranker API that applies Voyage AI reranking to refine results.
A ReAct agent that uses AWS Bedrock (Claude) with a tool discovery registry to reason over the database.
Data Platform: MongoDB Atlas
All data resides in a single MongoDB Atlas collection. Each document contains:
The event image (or a reference to it in S3).
A text description.
Environmental metadata fields, such as season, weather, time of day, and rarity score,
A 1024-dimensional vector embedding generated by Voyage AI's voyage-multimodal-3 model.
MongoDB Atlas provides the vector search index (with scalar quantization), a full-text search index, and aggregation pipelines, all queried through a unified API.
Voyage AI provides the embedding model (voyage-multimodal-3) for generating vector representations of images, metadata and queries, plus a reranker model (rerank-2) for improving result relevance.
External Services
AWS Bedrock hosts Claude models, which power the conversational AI agent for this solution. Users can access the agent via the chat panel located in the bottom-right corner of the interface.
Figure 2. Solution User Interface
The agent reasons about the user's question, decides which tool to call, executes it against the live MongoDB database, observes the result, and repeats the process until it can provide a final response.
The agent uses these tools:
search_events: Executes hybrid vector and text search against the MongoDB collection.get_stats: Uses a$facetaggregation that returns weather, season, time-of-day distributions and rarity statistics across the entire collection.compare_scenarios: Performs two parallel searches returned side by side. The agent streams its execution trace to the UI in real time, so every tool call and result is visible as it happens.
compare_scenarios demonstrates a Human-in-the-Loop (HITL) pattern.
When Claude decides to call it, the backend pauses the stream and sends
an approval prompt to the UI before the tool executes. The user must
click Approve or Reject. If no response arrives within 60 seconds, the
tool is automatically skipped.
Note: To trigger this flow, users can open the chat panel and click the suggested question labelled "Human in the loop demo", or type any question asking the agent to compare two driving scenarios, such as "Compare foggy vs clear weather driving scenarios".
Data Model Approach
The solution stores all event data in a single MongoDB collection, leveraging the flexible document model to co-locate multimodal information that would typically require costly joins across multiple tables in a relational system, resulting in faster queries and a simpler data model as event complexity grows. The following snippet illustrates the document model in practice:
{ "event_id": "e_00601", "domain": "adas", "source_dataset": "autonomous-driving-dataset", "image_path": "adas/e_00601.jpg", "image_url": null, "image_embedding": [0.0412, -0.0183, 0.0097, "... 1021 more values ..."], "text_description": "A foggy night scene on a rural road with low visibility and no other vehicles in sight.", "metadata": { "season": "fall", "time_of_day": "night", "weather": "foggy", "environment": "rural", "rarity_score": 0.847, "source_index": 601 }, "embedding_metadata": { "model": "voyage-multimodal-3.5", "dimensions": 1024, "original_bytes": 4096, "quantized_bytes": 1024 }, "created_at": "2025-03-15T09:42:11.000Z", "updated_at": null }
Document Structure
Each event document contains these key fields:
event_id: Unique identifier for the event, matches the image filename on disk or in S3.domain: Category of the event (e.g. "adas" for autonomous driving).source_dataset: The originating dataset.image_path: Relative path to the image on the local filesystem.image_url: S3 or CloudFront URL once images are migrated to cloud storage; null in local development.image_embedding: 1024-dimensional float32 vector generated by Voyage AI (voyage-multimodal-3 or 3.5).text_description: Natural language description of the scene, used for full-text Atlas Search.metadata: Nested object containing:season: spring, summer, fall, or winter.time_of_day: dawn, day, dusk, or night.weather: clear, cloudy, rainy, or foggy.environment: driving environment (e.g. rural).rarity_score: 0–1 indicator of how uncommon the scenario is.
embedding_metadata: Tracks the embedding model name, vector dimensions, and byte sizes for both the original float32 and quantized int8 representations.created_at: UTC timestamp of when the document was ingested.
Why This Model Works
By co-locating the image reference, text description, metadata, and
vector embedding in a single document, the solution eliminates joins. A
hybrid search query can match on the vector embedding and the text
description simultaneously, apply pre-filters on metadata fields
(season, weather, time_of_day), and return complete results
in a single round-trip.
The $facet aggregation allows the AI agent to compute distribution
statistics—such as weather breakdown, event counts by season,
time-of-day histograms, and rarity score statistics (average, min,
max)—across the entire collection in a single pipeline execution, with
no need for multiple queries.
Vector Index with Scalar Quantization
The vector search index applies scalar quantization at the index layer,
compressing the 1024-dimensional float32 embeddings from 4,096 bytes
down to 1,024 bytes per vector (int8). This indexing strategy reduces
the in-memory vector payload by 75% and retains approximately 90% recall
compared to full-fidelity search. The filter fields including
domain, metadata.season, metadata.time_of_day, and
metadata.weather are declared directly in the same index definition.
This setup allows metadata pre-filtering to happen inside
$vectorSearch before any results are returned to the application.
Build the Solution
The complete source code with detailed README is available on Industry Solutions public Github repository. To deploy the solution follow these steps.
Prerequisites
Python 3.13
Node.js 18 or higher (LTS recommended)
uv for Python dependency management
A MongoDB Atlas cluster
A Voyage AI API key
AWS credentials with Bedrock access
Ingest the Dataset
The solution uses the MIST Autonomous Driving Dataset on HuggingFace (jongwonryu/MIST-autonomous-driving-dataset), but you can substitute your own dataset by following the setup instructions in the README. The ingestion pipeline does the following processes:
Streams images from HuggingFace.
Applies diversity gating to ensure balanced coverage across weather, season, and time-of-day combinations.
Generates multimodal embeddings via Voyage AI (voyage-multimodal-3).
Inserts documents with Vector Search and Atlas Search indexes into MongoDB.
Run the pipeline from the backend/ directory:
uv run python services/ingestion_pipeline.py --sample-size 1000
This command streams approximately 1–2 GB and processes around 1,000 diverse images in 15–30 minutes. The default sample size is 500 if the flag is omitted.
Configure the Backend
Copy the example environment file and populate it with your credentials:
cp backend/.env.example backend/.env
Set MONGODB_URI, DATABASE_NAME, VOYAGE_API_KEY, and
optionally AWS_REGION / AWS_PROFILE.
Start the FastAPI server from the backend/ directory:
uv run uvicorn main:app --host 0.0.0.0 --port 8000
Configure the Frontend
Copy the frontend environment example to frontend/.env.local, install Node dependencies with npm install, and start the dev server with npm run dev. The frontend is accessible at http://localhost:3000.
cp frontend/EXAMPLE.env frontend/.env.local cd frontend && npm install && npm run dev
Explore the Query Pipeline
Once the demo is live, submit a query like "night drive in overcast conditions" to see the pipeline in action. The pipeline executes in this order:
Pre-filter: If the user selected metadata filters (season, weather, time of day), these fields are applied as pre-filters inside $vectorSearch, narrowing the candidate set for search.
$rankFusionHybrid Search: The query is embedded usingvoyage-multimodal-3and simultaneously runs a vector search against the scalar-quantized index, and a full-text MongoDB Search. The results use Reciprocal Rank Fusion to merge in a single aggregation pipeline.Reranker: The merged results use Voyage AI
rerank-2for reranking, which scores each candidate against the original query text for improved precision.
Key Learnings
Accelerate edge case discovery with hybrid search: Combining vector search and full-text search through MongoDB's
$rankFusionlets teams find rare driving scenarios using natural language queries. This approach also matches specific technical terms, fault codes, or sensor IDs that pure semantic search might miss.Reduce infrastructure costs with scalar quantization: MongoDB Atlas Vector Search compresses 1024-dimensional float32 vectors to int8, achieving memory savings while preserving recall. For datasets with millions of embeddings, this compression directly translates to lower hardware requirements.
Simplify multimodal data management with the document model: Storing images, embeddings, text descriptions, and metadata in a single MongoDB document eliminates the synchronization overhead of maintaining separate databases for operational data, a search engine, and a vector store.
Empower data scientists with conversational access: The ReAct-based AI agent backed by AWS Bedrock enables natural-language interaction with fleet data. Instead of writing custom aggregation queries, teams can ask questions like "Compare foggy vs. clear weather driving scenarios" and receive structured analysis with tool execution traces.
Improve retrieval quality with Voyage AI reranking: Adding a reranking step after hybrid search meaningfully improves result precision. Voyage AI's rerank-2 model re-scores candidates against the original query, pushing the most contextually relevant results to the top.
Authors
Humza Akhtar, MongoDB