Fireworks AI and MongoDB: The Fastest AI Apps with the Best Models, Powered By Your Data

Mat Keep and Angela Lee
March 26, 2024 | Updated: March 6, 2025
#genAI

We’re happy to announce that Fireworks AI and MongoDB are now partnering to make innovating with generative AI faster, more efficient, and more secure. Fireworks AI was founded in late 2022 by industry veterans from Meta’s PyTorch team, where they focused on performance optimization, improving the developer experience, and running AI apps at scale.

This post is also available in: Deutsch, Français, Español, Português, Italiano, 한국어, 简体中文.

It’s this expertise that Fireworks AI brings to its production AI platform, curating and optimizing the industry's leading open models. Benchmarking by the company shows gen AI models running on Fireworks AI deliver up to 4x faster inference speeds than alternative platforms, with up to 8x higher throughput and scale.

Models are one part of the application stack. But for developers to unlock the power of gen AI, they also need to bring enterprise data to those models. That’s why Fireworks AI has partnered with MongoDB, addressing one of the toughest challenges to adopting AI. With MongoDB Atlas, developers can securely unify operational data, unstructured data, and vector embeddings to safely build consistent, correct, and differentiated AI applications and experiences.

Jointly, Fireworks AI and MongoDB provide a solution for developers who want to leverage highly curated and optimized open-source models, and combine these with their organization’s own proprietary data — and to do it all with unparalleled speed and security.

Lightning-fast models from Fireworks AI: Enabling speed, efficiency, and value

Developers can choose from many different models to build their gen AI-powered apps. Navigating the AI landscape to identify the most suitable models for specific tasks — and tuning them to achieve the best levels of price and performance — is complex and creates friction in building and running gen AI apps. This is one of the key pain points that Fireworks AI alleviates.

With its lightning-fast inference platform, Fireworks AI curates, optimizes, and deploys 40+ different AI models. These optimizations can simultaneously result in significant cost savings, reduced latency, and improved throughput. Their platform delivers this via:

Off-the-shelf models, optimized models, and add-ons: Fireworks AI provides a collection of top-quality text, embedding, and image foundation models. Developers can leverage these models or fine-tune and deploy their own, pairing them with their own proprietary data using MongoDB Atlas.
Fine-tuning capabilities: To further improve model accuracy and speed, Fireworks AI also offers a fine-tuning service using its CLI to ingest JSON-formatted objects from databases such as MongoDB Atlas.
Simple interfaces and APIs for development and production: The Fireworks AI playground allows developers to interact with models right in a browser. It can also be accessed programmatically via a convenient REST API. This is OpenAI API-compatible and thus interoperates with the broader LLM ecosystem.
Cookbook: A simple and easy-to-use cookbook provides a comprehensive set of ready-to-use recipes that can be adapted for various use cases, including fine-tuning, generation, and evaluation.

Fireworks AI and MongoDB: Setting the standard for AI with curated, optimized, and fast models

With Fireworks AI and MongoDB Atlas, apps run in isolated environments ensuring uptime and privacy, protected by sophisticated security controls that meet the toughest regulatory standards:

As one of the top open-source model API providers, Fireworks AI serves 66 billion tokens per day (and growing).
With Atlas, you run your apps on a proven platform that serves tens of thousands of customers, from high-growth startups to the largest enterprises and governments.

Together, the Fireworks AI and MongoDB joint solution enables:

Retrieval-augmented generation (RAG) or Q&A from a vast pool of documents: Ingest a large number of documents to produce summaries and structured data that can then power conversational AI.
Classification through semantic/similarity search: Classify and analyze concepts and emotions from sales calls, video conferences, and more to provide better intelligence and strategies. Or, organize and classify a product catalog using product images and text.
Images to structured data extraction: Extract meaning from images to produce structured data that can be processed and searched in a range of vision apps — from stock photos, to fashion, to object detection, to medical diagnostics.
Alert intelligence: Process large amounts of data in real-time to automatically detect and alert on instances of fraud, cybersecurity threats, and more.

**Figure 1:** The Fireworks tutorial showcases how to bring your own data to LLMs with retrieval-augmented generation (RAG) and MongoDB Atlas

Getting started with Fireworks AI and MongoDB Atlas

To help you get started, review the Optimizing RAG with MongoDB Atlas and Fireworks AI tutorial, which shows you how to build a movie recommendation app and involves:

MongoDB Atlas Database that indexes movies using embeddings. (Vector Store)
A system for document embedding generation. We'll use the Fireworks embedding API to create embeddings from text data. (Vectorisation)
MongoDB Atlas Vector Search responds to user queries by converting the query to an embedding, fetching the corresponding movies. (Retrieval Engine)
The Mixtral model uses the Fireworks inference API to generate the recommendations. You can also use Llama, Gemma, and other great OSS models if you like. (LLM)
Loading MongoDB Atlas Sample Mflix Dataset to generate embeddings (Dataset)

We can also help you design the best architecture for your organization’s needs. Feel free to connect with your account team or contact us here to schedule a collaborative session and explore how Fireworks AI and MongoDB can optimize your AI development process.

Head over to our quick-start guide to get started with Atlas Vector Search today.

← Previous

Fireworks AI y MongoDB: las aplicaciones de IA más rápidas con los mejores modelos, impulsadas por sus datos

Nos complace anunciar que Fireworks AI y MongoDB ahora son socios para hacer que la innovación con IA Generativa sea más rápida, más eficiente y más segura. Fireworks AI fue fundada a finales de 2022 por veteranos de la industria del equipo PyTorch de Meta, donde se centraron en la optimización del rendimiento, la mejora de la experiencia del desarrollador y la ejecución de aplicaciones de IA a escala. Es esta la experiencia que Fireworks AI aporta a su plataforma de IA de producción, seleccionando y optimizando los modelos abiertos líderes de la industria. Las pruebas comparativas realizadas por la empresa demuestran que los modelos de IA Generativa que se ejecutan en Fireworks AI ofrecen velocidades de inferencia hasta 4 veces superiores a las de las plataformas alternativas, con un rendimiento y una escala hasta 8 veces superiores. Los modelos son una parte de la pila de aplicaciones. Pero para que los desarrolladores desbloqueen el poder de la IA Generativa, también deben incorporar datos empresariales a esos modelos. Es por eso que Fireworks AI se ha asociado con MongoDB, abordando uno de los desafíos más difíciles para la adopción de la IA. Con MongoDB Atlas , los desarrolladores pueden unificar de forma segura datos operativos, datos no estructurados e incrustaciones de vectores para crear de forma segura aplicaciones y experiencias de IA consistentes, correctas y diferenciadas. Conjuntamente, Fireworks AI y MongoDB proporcionan una solución para los desarrolladores que desean aprovechar modelos de código abierto altamente seleccionados y optimizados, y combinarlos con los datos patentados de su organización, y hacerlo todo con una velocidad y seguridad inigualables. Modelos ultrarrápidos de Fireworks AI: velocidad, eficacia y valor añadido Con su plataforma de inferencia ultrarrápida, Fireworks AI selecciona, optimiza e implementa más de 40 modelos diferentes de IA. Estas optimizaciones pueden suponer al mismo tiempo un importante ahorro de costos, una reducción de la latencia y una mejora del rendimiento. Su plataforma ofrece esto a través de: Modelos estándar, modelos optimizados y complementos: Fireworks AI proporciona una collection de modelos de texto, incrustación y base de imágenes de máxima calidad . Los desarrolladores pueden aprovechar estos modelos o afinar e implementar los suyos propios, emparejándolos con sus propios datos patentados mediante MongoDB Atlas. Capacidades de ajuste fino : Para mejorar aún más la precisión y velocidad del modelo, Fireworks AI también ofrece un servicio de ajuste fino utilizando su CLI para ingerir objetos con formato JSON de bases de datos como MongoDB Atlas. Interfaces y API simples para desarrollo y producción: El patio de juegos de Fireworks AI permite a los desarrolladores interactuar con modelos directamente en un navegador. También se puede acceder mediante programación a través de una conveniente REST API. Esto es compatible con la API de OpenAI y, por lo tanto, interopera con el ecosistema LLM más amplio. Manual: una guía simple y fácil de usar proporciona un conjunto completo de recetas listas para usar que se pueden adaptar para varios casos de uso, incluido el ajuste, la generación y la evaluación. Fireworks AI y MongoDB: cómo establecer el estándar para la IA con modelos seleccionados, optimizados y rápidos Con Fireworks AI y MongoDB Atlas, las aplicaciones se ejecutan en entornos aislados que garantizan el tiempo de actividad y la privacidad, protegidos por controles de seguridad sofisticados que cumplen con los estándares regulatorios más estrictos: Como uno de los principales proveedores de API de modelos de código abierto, Fireworks AI sirve a 66 mil millones de tokens por día (y sigue creciendo). Con Atlas, ejecuta sus aplicaciones en una plataforma probada que atiende a decenas de miles de clientes, desde startups de alto crecimiento hasta las empresas y gobiernos más grandes. Juntos, la solución conjunta de Fireworks AI y MongoDB permiten: Generación aumentada de recuperación (RAG) o preguntas y respuestas a partir de un amplio conjunto de documentos: procese una gran cantidad de documentos para producir resúmenes y datos estructurados que luego puedan impulsar la IA conversacional. Clasificación mediante búsqueda semántica/similar: clasifique y analice conceptos y emociones de llamadas de ventas, videoconferencias y mucho más para proporcionar mejor inteligencia y estrategias. O bien, organice y clasifique un catálogo de productos utilizando imágenes de productos y texto. Imágenes para extracción de datos estructurados: extraiga significado de las imágenes para producir datos estructurados que puedan procesarse y buscarse en una variedad de aplicaciones de visión, desde fotos de stock, moda, detección de objetos, hasta diagnósticos médicos. Inteligencia de alertas: procese grandes cantidades de datos en tiempo real para detectar y alertar automáticamente sobre instancias de fraude, amenazas de ciberseguridad y más. Figura 1: el tutorial de Fireworks muestra cómo llevar sus propios datos a los LLM con generación aumentada de recuperación (RAG) y MongoDB Atlas Primeros pasos con Fireworks AI y MongoDB Atlas Para ayudarte a comenzar, revisa la Optimización RAG con el tutorial de MongoDB Atlas y Fireworks AI , que te muestra cómo crear una aplicación de recomendación de películas e involucra la base de datos de MongoDB Atlas que indexa películas utilizando incrustaciones. (Almacén de vectores) Un sistema para la generación de incrustación de documentos. Usaremos la API de incrustación de Fireworks para crear incrustaciones a partir de datos de texto. (Vectorización) MongoDB Atlas Vector Search responde a las consultas de los usuarios convirtiendo la consulta en una incrustación y obteniendo las películas correspondientes. (Motor de recuperación) El modelo Mixtral utiliza la API de inferencia de Fireworks para generar las recomendaciones. También puede usar Llama, Gemma y otros excelentes modelos de OSS si lo desea. (LLM) Cargar el conjunto de datos Mflix de muestra de MongoDB Atlas para generar incrustaciones (conjunto de datos) También podemos ayudarle a diseñar la mejor arquitectura para las necesidades de su organización. No dude en comunicarse con su equipo de cuentas o póngase en contacto con nosotros aquí para programar una sesión de colaboración y explorar cómo Fireworks AI y MongoDB pueden optimizar su proceso de desarrollo de IA.

March 26, 2024

Next →

Next-Generation Mobility Solutions with Agentic AI and MongoDB Atlas

Driven by advancements in vehicle connectivity, autonomous systems, and electrification, the automotive and mobility industry is currently undergoing a significant transformation. Vehicles today are sophisticated machines, computers on wheels, that generate massive amounts of data, driving demand for connected and electric vehicles. Automotive players are embracing artificial intelligence (AI), battery electrical vehicles (BEVs), and software-defined vehicles (SDVs) to maintain their competitive advantage. However, managing fleets of connected vehicles can be a challenge. As cars get more sophisticated and are increasingly integrated with internal and external systems, the volume of data they produce and receive greatly increases. This data needs to be stored, transferred, and consumed by various downstream applications to unlock new business opportunities. This will only grow: the global fleet management market is projected to reach $65.7 billion by 2030, growing at a rate of almost 10.8% annually. A 2024 study conducted by Webfleet showed that 32% of fleet managers believe AI and machine learning will significantly impact fleet operations in the coming years; optimizing route planning and improving driver safety are the two most commonly cited use cases. As fleet management software providers continue to invest in AI, the integration of agentic AI can significantly help with things like route optimization and driver safety enhancement. For example, AI agents can process real-time traffic updates and weather conditions to dynamically adjust routes, ensuring timely deliveries while advising drivers on their car condition. This proactive approach contrasts with traditional reactive methods, improving vehicle utilization and reducing operational and maintenance costs. But what are agents? In short, they are operational applications that attempt to achieve goals by observing the world and acting upon it using the data and tools the application has at its disposal. The term "agentic" denotes having agency, as AI agents can proactively take steps to achieve objectives without constant human oversight. For example, rather than just reporting an anomaly based on telemetry data analysis, an agent for a connected fleet could autonomously cross-check that anomaly against known issues, decide whether it's critical or not, and schedule a maintenance appointment all on its own. Why MongoDB for agentic AI Agentic AI applications are dynamic by nature as they require the ability to create a chain of thought, use external tools, and maintain context across their entire workflow. These applications generate and consume diverse data types, including structured and unstructured data. MongoDB’s flexible document model is uniquely suited to handle both structured and unstructured data as vectors. It allows all of an agent’s context, chain-of-thought, tools metadata, and short-term and long-term memory to be stored in a single database. This means that developers can spend more time on innovation and rapidly iterate on agent designs without being constrained by rigid schemas of a legacy relational database. Figure 1. Major components of an AI agent. Figure 1 shows the major components of an AI agent. The agent will first receive a task from a human or via an automated trigger, and will then use a large language model (LLM) to generate a chain of thought or follow a predetermined workflow. The agent will use various tools and models during its run and store/retrieve data from a memory provider like MongoDB Atlas . Tools: The agent utilizes tools to interact with the environment. This can contain API methods, database queries, vector search, RAG application, anything to support the model Models: can be a large language model (LLM), vision language model (VLM), or a simple supervised machine learning model. Models can be general purpose or specialized, and agents may use more than one. Data: An agent requires different types of data to function. MongoDB’s document model allows you to easily model all of this data in one single database. An agentic AI spans a wide range of functional tools and context. The underlying data structures evolve throughout the agentic workflow and as an agent uses different tools to complete a task. It also builds up memory over time. Let us list down the typical data types you will find in an agentic AI application. Data types: Agent profile: This contains the identity of the agent. It includes instructions, goals and constraints. Short-term memory: This holds temporary, contextual information—recent data inputs or ongoing interactions—that the agent uses in real-time. For example, short-term memory could store sensor data from the last few hours of vehicle activity. In certain agentic AI frameworks like Langgraph, short term memory is implemented through a checkpointer. The checkpointer stores intermediate states of the agent’s actions and/or reasoning. This memory allows the agent to seamlessly pause and resume operations. Long-term memory: This is where the agent stores accumulated knowledge over time. This may include patterns, trends, logs and historical recommendations and decisions. By storing each of these data types into rich, nested documents in MongoDB, AI developers can create a single-view representation of an agent’s state and behavior. This enables fast retrieval and simplifies development. In addition to the document model advantage, building agentic AI solutions for mobility requires a robust data infrastructure. MongoDB Atlas offers several key advantages that make it an ideal foundation for these AI-driven architectures. These include: Scalability and flexibility: Connected Car platforms like fleet management systems need to handle extreme data volumes and variety. MongoDB Atlas is proven to scale horizontally across cloud clusters, letting you ingest millions of telemetry events per minute and store terabytes of telemetry data with ease. For example, the German company ZF uses MongoDB to process 90,000 vehicle messages per minute (over 50 GB of data per day) from hundreds of thousands of connected cars. The flexibility of the document model accelerates development and ensures your data model stays aligned with the real-world entities it represents. Built-in vector search: AI agents require a robust set of tools to work with. One of the most widely used tools is vector search, which allows agents to perform semantic searches on unstructured data like driver logs, error codes descriptions, and repair manuals. MongoDB Atlas Vector Search allows you to store and index high-dimensional vectors alongside your documents and to perform semantic search over unstructured data. In practice, this means your AI embeddings live right next to the relevant vehicle telemetry and operational data in the database, simplifying architectures for use cases like the connected car incident advisor, in which a new issue can be matched against past issues before passing contextual information to the LLM. For more, check out this example of how an automotive OEM leverages vector search for audio based diagnostics with MongoDB Atlas Vector Search. Time series collections and real-time data processing: MongoDB Atlas is designed for real-time applications. It provides time series collections for connected car telemetry data storage, change streams, and triggers that can react to new data instantly. This is crucial for agentic AI feedback loops, where ongoing data ingestion and learning are happening continuously. Best-in-class embedding models with Voyage AI: In early 2025, MongoDB acquired Voyage AI , a leader in embedding and reranking models. Voyage AI embedding models are currently being integrated into MongoDB Atlas, which means developers will no longer need to manage external embedding APIs, standalone vector stores, or complex search pipelines. AI retrieval will be built into the database itself, making semantic search, vector retrieval, and ranking as seamless as traditional queries. This will reduce the time required for developing agentic AI applications. Agentic AI in action: Connected fleet incident advisor Figure 2 shows a list of use cases in the Mobility sector, sorted by various capabilities that an agent might demonstrate. AI agents excel at managing multi-step tasks via context management across tasks, they automate repetitive tasks better than Robotic process automation (RPA), and they demonstrate human-like reasoning by revisiting and revising past decisions. These capabilities enable a wide range of applications both during the manufacturing of a vehicle and while it's on the road, connected and sending telemetry. We will review a use case in detail below, and will see how it can be implemented using MongoDB Atlas, LangGraph, Open AI, and Voyage AI. Figure 2. Major use cases of agentic AI in the mobility and manufacturing sectors. First, the AI agent connects to traditional fleet management software and supports the fleet manager in diagnosing and advising the drivers. This is an example of a multi-step diagnostic workflow that gets triggered when a driver submits a complaint about the vehicle's performance (for example, increased fuel consumption). Figure 3 shows the sequence diagram of the agent. Upon receiving the driver complaint, it creates a chain of thought that follows a multi-step diagnostic workflow where the system ingests vehicle data such as engine codes and sensor readings, generates embeddings using the Voyage AI voyage-3-large embedding model, and performs a vector search using MongoDB Atlas to find similar past incidents. Once relevant cases are identified, those–along with selected telemetry data–are passed to OpenAI gpt-4o LLM to generate a final recommendation for the driver (for example, to pull off immediately or to keep driving and schedule regular maintenance). All data, including telemetry, past issues, session logs, agent profiles, and recommendations are stored in MongoDB Atlas, ensuring traceability and the ability to refine diagnostics over time. Additionally, MongoDB Atlas is used as a checkpointer by LangGraph, which defines the agent's workflow. Figure 3. Sequence diagram for a connected fleet advisor agentic workflow. Figure 4 shows the agent in action, from receiving an issue to generating a recommendation. So by leveraging MongoDB’s flexible data model and powerful Vector Search capabilities, we can agentic AI can transform fleet management through predictive maintenance and proactive decision-making. Figure 4. The connected fleet advisor AI agent in action. To set up the use case shown in this article, please visit our GitHub repository . And to learn more about MongoDB’s role in the automotive industry, please visit our manufacturing and automotive webpage . Want to learn more about why MongoDB is the best choice for supporting modern AI applications? Check out our on-demand webinar, “ Comparing PostgreSQL vs. MongoDB: Which is Better for AI Workloads? ” presented by MongoDB Field CTO, Rick Houlihan.

April 4, 2025