Beyond Vectors: Augment LLM Capabilities With MongoDB Aggregation Framework

Fabian Valle16 min read • Published Jun 20, 2024 • Updated Jun 20, 2024

AI MongoDB

Rate this tutorial

In the field of investment management, having transaction data that are updated in real time is your most powerful ally. One bad decision can negatively impact your entire portfolio. If you know how to leverage transactional data, you can use it to discover actionable insights and make more strategic investment decisions. This article will explore how MongoDB's aggregation framework and GenAI work together to transform your data analysis workflow.

Large language models (LLMs) have significantly changed the way we interact with computers, providing capabilities such as drafting emails, writing poetry, and even engaging in human-like conversations. However, when it comes to dealing with complex data processing and mathematical calculations, LLMs have their limitations.

While LLMs excel at language, they can’t understand and manipulate numbers or symbols in the same way. That's where MongoDB's aggregation framework shines. It allows you to process entire collections of data, passing it through a multi-stage pipeline. Within these stages, you can perform calculations and transformations on entire collections. This allows you to bypass the limitations of LLMs, providing a reliable method for data analysis.

In this article, we’ll use the MongoDB aggregation framework and GenAI to overcome the limitations of classic RAG. We'll explore the MongoDB Atlas sample dataset — specifically, the sample_analytics database and the transactions collection. The sample_analytics database contains three collections for a typical financial services application: customers, accounts, and transactions.

For this example, we'll focus on transaction data, which offers a realistic dataset that allows users to hone their skills in data analysis, querying, and aggregation, particularly in the context of finance.

The source code is available at GitHub - mdb-agg-crewai

Before we start

To follow along, you'll need:

A MongoDB Atlas cluster: Create your free cluster and load the sample dataset.
An LLM resource: CrewAI supports various LLM connections, including local models (Ollama), APIs like Azure, and all LangChain LLM components for customizable AI solutions. Learn more about CrewAI LLM support.

Note: The source code in the example uses Azure OpenAI. To follow along, you’ll need a valid Azure OpenAI deployment.

sample_analytics.transactions

The sample_analytics database contains three collections (customers, accounts, transactions) for a typical financial services application. The transactions collection contains transaction details for users. Each document contains an account ID, a count of how many transactions are in this set, the start and end dates for transactions covered by this document, and a list of sub-documents. Each sub-document represents a single transaction and the related information for that transaction.

transaction_id: This is a unique identifier that distinctly marks each transaction.
account_id: This field establishes a connection between the transaction and its corresponding account.
date: This represents the precise date and time at which the transaction took place.
transaction_code: This indicates the nature of the transaction, such as a deposit, withdrawal, buy, or sell.
symbol: This field denotes the symbol of the stock or investment involved in the transaction.
amount: This reflects the value of the transaction.
total: This captures the comprehensive transacted amount, inclusive of quantities, fees, and any additional charges associated with the transaction.

The task: uncover hidden opportunities

Picture this: You're running a company with a standard financial services application. Your objective? Spot hidden opportunities in the market by scrutinizing all transaction data and identifying the top three stocks based on net gain or loss. We can then research current events and market trends to uncover potential opportunities in the stocks that have historically shown the best net gain, according to our transaction data.

Net gain provides a clear picture of the profitability of an investment over a certain period. It's the difference between the total amount received from selling an investment (like stocks) and the total amount spent buying it.

There are several reasons why net gain matters. It allows investors to determine the profitability of their investments (positive or negative). Also, net gain helps investors compare how different investments are performing, whether they’re succeeding or falling behind. Third, net gain can be used to evaluate the effectiveness of an investment strategy. Investments that consistently result in a negative net gain might be too risky and may need to be sold off. And lastly, net gain can influence future investment decisions and mitigate risk — for instance, by identifying which stocks have historically shown the best net gain.

In a traditional SQL environment, calculating the net gain on transactional data would require multiple subqueries, temporary tables, and joins — a complex and potentially inefficient process, especially when dealing with large datasets. It could also be a resource-intensive task, demanding significant computational power and time.

The process can be more efficient by harnessing the power of MongoDB's aggregation framework, combined with the intelligent capabilities of AI technologies like CrewAI and LLMs. This not only streamlines the process but also offers deeper insights.

The solution: MongoDB's aggregation framework

The aggregation pipeline we will build calculates the total buy and sell values for each stock. Then, it calculates the net gain or loss by subtracting the total buy value from the total sell value. Next, the stocks are sorted by net gain or loss in descending order, with the highest net gains at the top.

If you’re new to MongoDB, I suggest you build this aggregation pipeline using the aggregation builder in Compass and then export it to Python. The aggregation pipeline builder in MongoDB Compass helps you create aggregation pipelines to process documents from a collection or view and return computed results.

Supercharge investment analysis with MongoDB and CrewAI

(image from LangChain Blog | CrewAI: The Future of AI Agent Teams)

The MongoDB aggregation pipeline gives us the data we need to analyze. When you can extract meaningful insights from raw data faster, you can make better investment decisions. CrewAI, combined with MongoDB Atlas, provides a unique approach that goes beyond basic number-crunching to deliver actionable insights.

For this example, we will create an Investment Researcher agent. This agent finds valuable data using tools like search engines. It's designed to identify financial trends, company news, and analyst insights. Learn more about creating agents using CrewAI.

Unlocking the power of AI collaboration: agents, tasks, and tools

Artificial intelligence (AI) is rapidly evolving, transforming how we work in the data-driven world we live in. CrewAI introduces a framework for collaborative AI that empowers teams to achieve more by leveraging specialized AI units and streamlined workflows.

At the core of CrewAI lie agents. These are not your typical AI assistants. Instead, they function as intelligent team members, each with a distinct role (e.g., researcher, writer, editor) and a well-defined goal. They can perform tasks, make decisions, and communicate with other agents.

But what truly sets CrewAI apart is the seamless collaboration between these agents. This is achieved through a system of tasks. Tasks act as the building blocks of CrewAI workflows, allowing you to define a sequence of actions that leverage the strengths of different agents.

CrewAI also provides a comprehensive arsenal of tools that empower these agents. These tools include web scraping, data analysis, and content generation. By equipping agents with the right tools, you can ensure they have everything they need to perform their tasks effectively.

In essence, CrewAI's powerful combination of agents, tasks, and tools empowers you to:

Automate repetitive tasks.
Streamline workflows.
Unlock the true potential of AI.

The code

In this section, we'll walk through the Python code used to perform financial analysis based on transaction data stored in MongoDB, using GenAI for data analysis. The Python version used during development was 3.10.10.

Here are the required packages to run the code. Make sure they are installed properly before continuing.

requirements.txt

1 pymongo==4.7.2
2 crewai==0.22.5
3 langchain==0.1.10
4 langchain-community
5 langchain-openai==0.0.5
6 duckduckgo-search==6.1.5

You can install all the packages by running pip install -r requirements.txt.

MongoDB setup

First, we set up a connection to MongoDB using PyMongo. This is where our transaction data is stored.

Important: While we're including the connection string directly in the code for demonstration purposes, it's not recommended for real-world applications. A more secure approach is to retrieve the connection string from your MongoDB Atlas cluster.

Here's how to access your connection string from Atlas:

Log in to your MongoDB Atlas account and navigate to your cluster.
Click on "Connect" in the left-hand navigation menu.
Choose the driver you'll be using (e.g., Python) and its version.
You'll see a connection string provided. Copy this string for use in your application.

Once you have your connection string, you are ready to start.

file: investment_analysis.py

1 import os
2 import pymongo
3 
4 MDB_URI = "mongodb+srv://<user>:<password>@cluster0.abc123.mongodb.net/"
5 client = pymongo.MongoClient(MDB_URI)
6 db = client["sample_analytics"]
7 collection = db["transactions"]

Azure OpenAI setup

Next, we set up our Azure OpenAI LLM resource. The code in the example uses Azure OpenAI. To follow along, you’ll need a valid Azure OpenAI deployment.

file: investment_analysis.py

1 from langchain_openai import AzureChatOpenAI
2 
3 AZURE_OPENAI_ENDPOINT = "https://__DEMO__.openai.azure.com"
4 AZURE_OPENAI_API_KEY = "__AZURE_OPENAI_API_KEY__"
5 deployment_name = "gpt-4-32k"  # The name of your model deployment
6 default_llm = AzureChatOpenAI(
7     openai_api_version=os.environ.get("AZURE_OPENAI_VERSION", "2023-07-01-preview"),
8     azure_deployment=deployment_name,
9     azure_endpoint=AZURE_OPENAI_ENDPOINT,
10     api_key=AZURE_OPENAI_API_KEY
11 )

Web search API setup

For this example, we will be using the the DuckDuckGo Search LangChain integration. The DuckDuckGo Search is a component that allows users to search the web using DuckDuckGo.

file: investment_analysis.py

1 # Web Search Setup
2 from langchain.tools import tool
3 from langchain_community.tools import DuckDuckGoSearchResults
4 duck_duck_go = DuckDuckGoSearchResults(backend="news")
5 
6 # Search Tool - Web Search
7 @tool
8 def search_tool(query: str):
9   """
10   Perform online research on a particular stock.
11   """
12   return duck_duck_go.run(query)

DuckDuckGo was chosen for this example because it:

Requires no API key.
Is easy to use.
Provides snippets.

CrewAI setup

We'll be using CrewAI to manage our agents and tasks. In this case, we have one agent — a researcher who is tasked with analyzing the data and providing insights. In CrewAI, tasks are the individual steps that make up a larger workflow.

Agents and tasks: working together as a crew

In CrewAI, a crew represents a collaborative group of agents working together to achieve a set of tasks. While our example is a single-agent crew for simplicity, you can create multi-agent crews for more complex workflows.

Tasks: These are the individual steps that make up your investment research workflow. Each task represents a specific action the agent needs to take to achieve the overall goal.
Agents: Think of these as the workers who execute the tasks. We'll have a dedicated Investment Researcher agent equipped with the necessary tools and knowledge to complete the assigned tasks.

Fine-tuning your Investment Researcher

CrewAI allows you to customize your agent's behavior through various parameters:

Role and goal (AGENT_ROLE & AGENT_GOAL): These define the agent's purpose. Here, we set the role to "Investment Researcher" with a goal of "identifying investment opportunities." This guides the agent toward relevant data sources and analysis methods (e.g., market trends, company news, analyst reports).
Backstory: Craft a backstory like "Expert stock researcher with decades of experience" to add context and potentially influence the agent's communication style and interpretation of information.
Tools: Equip your agent with tools (functions or classes) to complete its tasks. This could include a search tool for gathering information or an analysis tool for processing data.
Large language model (LLM): This is the AI engine powering the agent's tasks, like text processing and generation. Choosing a different LLM can significantly impact the agent's output based on the underlying LLM’s strengths and weaknesses.
Verbose (verbose): Setting verbose=True provides a more detailed log of the agent's thought process for debugging purposes.

By adjusting these parameters, you can tailor your investment research agent to focus on specific market sectors, prioritize information sources, and even influence its risk tolerance or investment style (through the backstory).

file: investment_analysis.py

1 # Research Agent Setup
2 from crewai import Crew, Process, Task, Agent
3 AGENT_ROLE = "Investment Researcher"
4 AGENT_GOAL = """
5   Research stock market trends, company news, and analyst reports to identify potential investment opportunities.
6 """
7 researcher = Agent(
8   role=AGENT_ROLE,
9   goal=AGENT_GOAL,
10   verbose=True,
11   llm=default_llm,
12   backstory='Expert stock researcher with decades of experience.',
13   tools=[search_tool]
14 )
15 
16 task1 = Task(
17   description="""
18 Using the following information:
19 
20 [VERIFIED DATA]
21 {agg_data}
22 
23 *note*
24 The data represents the net gain or loss of each stock symbol for each transaction type (buy/sell).
25 Net gain or loss is a crucial metric used to gauge the profitability or efficiency of an investment.
26 It's computed by subtracting the total buy value from the total sell value for each stock.
27 [END VERIFIED DATA]
28 
29 [TASK]
30 - Generate a detailed financial report of the VERIFIED DATA.
31 - Research current events and trends, and provide actionable insights and recommendations.
32 
33 
34 [report criteria]
35   - Use all available information to prepare this final financial report
36   - Include a TLDR summary
37   - Include 'Actionable Insights'
38   - Include 'Strategic Recommendations'
39   - Include a 'Other Observations' section
40   - Include a 'Conclusion' section
41   - IMPORTANT! You are a friendly and helpful financial expert. Always provide the best possible answer using the available information.
42 [end report criteria]
43   """,
44   agent=researcher,
45   expected_output='concise markdown financial summary of the verified data and list of key points and insights from researching current events',
46   tools=[search_tool],
47 )
48 # Crew Creation
49 tech_crew = Crew(
50   agents=[researcher],
51   tasks=[task1],
52   process=Process.sequential
53 )

MongoDB aggregation pipeline

Next, we define our MongoDB aggregation pipeline. This pipeline is used to process our transaction data and calculate the net gain for each stock symbol.

file: investment_analysis.py

1 # MongoDB Aggregation Pipeline
2 pipeline = [
3   {
4 	"$unwind": "$transactions"  # Deconstruct the transactions array into separate documents
5   },
6   {
7 	"$group": {  					# Group documents by stock symbol
8   	"_id": "$transactions.symbol",  # Use symbol as the grouping key
9   	"buyValue": {    				# Calculate total buy value
10     	"$sum": {
11   		"$cond": [   				# Conditional sum based on transaction type
12 			{ "$eq": ["$transactions.transaction_code", "buy"] },  # Check for "buy" transactions
13 			{ "$toDouble": "$transactions.total" },   			# Convert total to double for sum
14 			0                          						# Default value for non-buy transactions
15   		]
16     	}
17   	},
18   	"sellValue": {   				# Calculate total sell value (similar to buyValue)
19     	"$sum": {
20   		"$cond": [
21 			{ "$eq": ["$transactions.transaction_code", "sell"] },
22 			{ "$toDouble": "$transactions.total" },
23 			0
24   		]
25     	}
26   	}
27 	}
28   },
29   {
30 	"$project": { 					# Project desired fields (renaming and calculating net gain)
31   	"_id": 0,						# Exclude original _id field
32   	"symbol": "$_id", 				# Rename _id to symbol for clarity
33   	"netGain": { "$subtract": ["$sellValue", "$buyValue"] }  # Calculate net gain
34 	}
35   },
36   {
37 	"$sort": { "netGain": -1 }  # Sort results by net gain (descending)
38   },
39   {"$limit": 3}  # Limit results to top 3 stocks
40 ]
41 
42 
43 results = list(collection.aggregate(pipeline))
44 client.close()
45 
46 print("MongoDB Aggregation Pipeline Results:")
47 print(results)

Here's a breakdown of what the MongoDB pipeline does:

Unwinding transactions: Each document contains information about multiple stock purchases and sales. The pipeline uses the $unwind operator to unpack an array field named "transactions" within each document. Unwinding separates these transactions into individual documents, simplifying subsequent calculations.
Grouping by symbol: Next, the $group operator groups the unwound documents based on the value in the "transactions.symbol" field. This essentially combines all transactions for a specific stock (represented by the symbol) into a single group.
Calculating buy and sell values: Within each symbol group, the pipeline calculates two crucial values:
- buyValue: This uses the $sum accumulator along with a conditional statement ($cond). The $cond checks if the "transaction_code" within the "transactions" object is "buy." If it is, it converts the "total" field (the transaction amount) to a double using $toDouble and adds it to the running total for buyValue. If it's not a buy transaction, it contributes nothing (0) to the sum. This effectively calculates the total amount spent buying shares of that specific symbol.
- sellValue: Similar to buyValue, this calculates the total amount received by selling shares of the same symbol. It uses the same logic but checks for "transaction_code" equal to "sell" and sums those "total" values.
Projecting results: Now, the $project operator defines the final output format. It discards the automatically generated grouping identifier (_id) by setting it to 0. It then renames the grouping field (_id which held the "transactions.symbol") to a clearer name, "symbol." Finally, it calculates the net gain or loss for each symbol using the $subtract operator. This subtracts the buyValue from the sellValue to determine the net gain or loss for that symbol.
Sorting by net gain: The $sort operator organizes the results. It sorts the documents based on the "netGain" field in descending order (-1) so that the symbols with the highest net gain (most profitable) will appear first in the final output.
Limiting results: Lastly, the $limit operator limits the number of documents passed to the next stage in the pipeline. In this case, it's set to 3, meaning only the top three documents (stocks with the highest net gain) will be included in the final output.

Preliminary check: ensuring error-free execution

Before we initiate our automated agent workflow, we must ensure that the code executed so far is error-free.

The expected output should resemble the following:

1 MongoDB Aggregation Pipeline Results:
2 [{'netGain': 72769230.71428967, 'symbol': 'amzn'},
3  {'netGain': 39912931.04990542, 'symbol': 'sap'},
4  {'netGain': 25738882.292086124, 'symbol': 'aapl'}]

Initiating the agent task execution

We can now kick off our task execution. The researcher agent will utilize the data derived from our MongoDB aggregation, along with any other tools at its disposal, to analyze the data and offer insight.

file: investment_analysis.py

1 tech_crew.kickoff(inputs={'agg_data': str(results)})

Complete Source Code

file: investment_analysis.py

1 import os
2 import pymongo
3 import pprint
4 
5 # MongoDB Setup
6 MDB_URI = "mongodb+srv://<user>:<password>@cluster0.abc123.mongodb.net/"
7 client = pymongo.MongoClient(MDB_URI)
8 db = client["sample_analytics"]
9 collection = db["transactions"]
10 
11 # Azure OpenAI Setup
12 from langchain_openai import AzureChatOpenAI
13 AZURE_OPENAI_ENDPOINT = "https://__DEMO__.openai.azure.com"
14 AZURE_OPENAI_API_KEY = "__AZURE_OPENAI_API_KEY__"
15 deployment_name = "gpt-4-32k"  # The name of your model deployment
16 default_llm = AzureChatOpenAI(
17     openai_api_version=os.environ.get("AZURE_OPENAI_VERSION", "2023-07-01-preview"),
18     azure_deployment=deployment_name,
19     azure_endpoint=AZURE_OPENAI_ENDPOINT,
20     api_key=AZURE_OPENAI_API_KEY
21 )
22 
23 # Web Search Setup
24 from langchain.tools import tool
25 from langchain_community.tools import DuckDuckGoSearchResults
26 duck_duck_go = DuckDuckGoSearchResults(backend="news",max_results=10)
27 
28 # Search Tool - Web Search
29 @tool
30 def search_tool(query: str):
31   """
32   Perform online research on a particular stock.
33   Will return search results along with snippets of each result.
34   """
35   print("\n\nSearching DuckDuckGo for:", query)
36   search_results = duck_duck_go.run(query)
37   search_results_str =  "[recent news for: " + query + "]\n" + str(search_results)
38   return search_results_str
39 
40 
41 # Research Agent Setup
42 from crewai import Crew, Process, Task, Agent
43 AGENT_ROLE = "Investment Researcher"
44 AGENT_GOAL = """
45   Research stock market trends, company news, and analyst reports to identify potential investment opportunities.
46 """
47 researcher = Agent(
48   role=AGENT_ROLE,
49   goal=AGENT_GOAL,
50   verbose=True,
51   llm=default_llm,
52   backstory='Expert stock researcher with decades of experience.',
53   tools=[search_tool]
54 )
55 
56 task1 = Task(
57   description="""
58 Using the following information:
59 
60 [VERIFIED DATA]
61 {agg_data}
62 
63 *note*
64 The data represents the net gain or loss of each stock symbol for each transaction type (buy/sell).
65 Net gain or loss is a crucial metric used to gauge the profitability or efficiency of an investment.
66 It's computed by subtracting the total buy value from the total sell value for each stock.
67 [END VERIFIED DATA]
68 
69 [TASK]
70 - Generate a detailed financial report of the VERIFIED DATA.
71 - Research current events and trends, and provide actionable insights and recommendations.
72 
73 
74 [report criteria]
75   - Use all available information to prepare this final financial report
76   - Include a TLDR summary
77   - Include 'Actionable Insights'
78   - Include 'Strategic Recommendations'
79   - Include a 'Other Observations' section
80   - Include a 'Conclusion' section
81   - IMPORTANT! You are a friendly and helpful financial expert. Always provide the best possible answer using the available information.
82 [end report criteria]
83   """,
84   agent=researcher,
85   expected_output='concise markdown financial summary of the verified data and list of key points and insights from researching current events',
86   tools=[search_tool],
87 )
88 # Crew Creation
89 tech_crew = Crew(
90   agents=[researcher],
91   tasks=[task1],
92   process=Process.sequential
93 )
94 
95 # MongoDB Aggregation Pipeline
96 pipeline = [
97   {
98 	"$unwind": "$transactions"  # Deconstruct the transactions array into separate documents
99   },
100   {
101 	"$group": {  					# Group documents by stock symbol
102   	"_id": "$transactions.symbol",  # Use symbol as the grouping key
103   	"buyValue": {    				# Calculate total buy value
104     	"$sum": {
105   		"$cond": [   				# Conditional sum based on transaction type
106 			{ "$eq": ["$transactions.transaction_code", "buy"] },  # Check for "buy" transactions
107 			{ "$toDouble": "$transactions.total" },   			# Convert total to double for sum
108 			0                          						# Default value for non-buy transactions
109   		]
110     	}
111   	},
112   	"sellValue": {   				# Calculate total sell value (similar to buyValue)
113     	"$sum": {
114   		"$cond": [
115 			{ "$eq": ["$transactions.transaction_code", "sell"] },
116 			{ "$toDouble": "$transactions.total" },
117 			0
118   		]
119     	}
120   	}
121 	}
122   },
123   {
124 	"$project": { 					# Project desired fields (renaming and calculating net gain)
125   	"_id": 0,						# Exclude original _id field
126   	"symbol": "$_id", 				# Rename _id to symbol for clarity
127   	"netGain": { "$subtract": ["$sellValue", "$buyValue"] }  # Calculate net gain
128 	}
129   },
130   {
131 	"$sort": { "netGain": -1 }  # Sort results by net gain (descending)
132   },
133   {"$limit": 3}  # Limit results to top 3 stocks
134 ]
135 results = list(collection.aggregate(pipeline))
136 client.close()
137 
138 # Print MongoDB Aggregation Pipeline Results
139 print("MongoDB Aggregation Pipeline Results:")
140 
141 pprint.pprint(results) #pprint is used to  to “pretty-print” arbitrary Python data structures
142 
143 # Start the task execution
144 tech_crew.kickoff(inputs={'agg_data': str(results)})

Example OUTPUT

1 Thought:
2 The recent news for Apple indicates that the company's stock has reached a $3 trillion valuation, largely due to the hype surrounding the introduction of AI to iPhones. This could be a significant catalyst for Apple's future growth. Now, I have enough information to generate a detailed financial report of the verified data, including a TLDR summary, actionable insights, strategic recommendations, other observations, and a conclusion.
3 
4 Final Answer:
5 
6 **Financial Report**
7 
8 **TLDR Summary**
9 
10 Based on the verified data, the net gains for the three stocks are as follows:
11 
12 1. Amazon (AMZN) - $72,769,230.71
13 2. SAP - $39,912,931.04
14 3. Apple (AAPL) - $25,738,882.29
15 
16 Amazon has the highest net gain, followed by SAP and Apple.
17 
18 **Actionable Insights**
19 
20 - **Amazon (AMZN):** The company's stock is seen as a good buy due to its attractive valuation and significant dominance in the e-commerce market.
21 - **SAP:** The company is making a significant acquisition of WalkMe Ltd., which could potentially boost its value and market position.
22 - **Apple (AAPL):** The company's stock has reached a $3 trillion valuation, largely due to the hype surrounding the introduction of AI to iPhones. This could be a significant catalyst for Apple's future growth.
23 
24 **Strategic Recommendations**
25 
26 - **Amazon (AMZN):** Given its dominant position in e-commerce and attractive valuation, it might be a good idea to consider increasing investments in Amazon.
27 - **SAP:** Considering the potential value boost from the recent acquisition, investors might want to keep a close watch on SAP's performance and consider it for their portfolio.
28 - **Apple (AAPL):** With the hype around the introduction of AI to iPhones, Apple's stock could see significant growth. It might be a good time to invest or increase existing investments.
29 
30 **Other Observations**
31 
32 The companies have seen fluctuations in their stock prices but generally perform well. The current trends and developments indicate potential for further growth.
33 
34 **Conclusion**
35 
36 Given the net gains and recent developments, Amazon, SAP, and Apple seem to be promising investments. However, as with any investment decision, it's important to consider individual financial goals, risk tolerance, and market conditions. It's always recommended to conduct further research or consult with a financial advisor before making investment decisions.
37 
38 This report provides a high-level overview of the current events and trends impacting these stocks, but the rapidly changing market environment necessitates regular monitoring and analysis of investment portfolios.
39 
40 > Finished chain.

Limitations and considerations

MongoDB's aggregation framework and GenAI are powerful tools for analyzing data, but we must recognize a few potential limitations.

First, there’s a bigger dependence on historical data. The past performance of an investment isn’t necessarily indicative of future results. This is especially the case in unpredictable markets.

Second, there’s a dependence on search result snippets. The snippets provided by DuckDuckGo may not always provide enough information. You would perhaps want to consider scraping the search result URL using something like Firecrawl, which can crawl and convert any website into clean markdown or structured data.

Next, there’s always going to be uncertainty in predictions, despite how savvy these tools can be.

And finally, we must consider that LLMs have their own limitations. They’re always evolving and continually improving. However, biases in training data or limitations in the model's architecture could lead to inaccurate or misleading insights.

It’s important to be aware of these limitations so you can ensure a more responsible and well-rounded approach to investment analysis.

Conclusion

In this article, we explored how MongoDB's aggregation framework, large language models, and CrewAI can be leveraged to transform investment analysis. The key to making smarter investment decisions is harnessing the power of your transaction data. MongoDB's aggregation framework provides the tools to efficiently calculate essential metrics like net gain, right within the data platform, with no additional code required at the application layer.

When combined with CrewAI's ability to automate research workflows, you gain a deeper understanding of the market, identify new opportunities, make smarter decisions, and boost your investment success.

The future: AI-powered investment analysis

The future of investment analysis belongs to those who embrace data and AI. By combining MongoDB's robust data platform with the insight-generating capabilities of AI tools like CrewAI, you gain the ability to:

Analyze trends faster than those relying on traditional methods.
Identify profitable patterns that others miss.
Make informed decisions backed by both raw data and contextual insights.
Automate tedious analysis, giving you more time for strategic thinking.

Don't just analyze the market — shape it. Explore MongoDB and AI today, and transform your investment decision-making process.

The source code is available at GitHub - mdb-agg-crewai.

Questions? Comments? Join us in the MongoDB Developer Community to continue the conversation.

Rate this tutorial

Tutorial

Coding With Mark: Abstracting Joins & Subsets in Python

Mar 19, 2024 | 11 min read

Tutorial

How to Use Custom Aggregation Expressions in MongoDB 4.4

Sep 23, 2022 | 11 min read

Tutorial

Currency Analysis with Time Series Collections #2 — Simple Moving Average and Exponential Moving Average Calculation

May 16, 2022 | 7 min read

Article

Building Remix Applications with the MongoDB Stack

Apr 02, 2024 | 4 min read

sample_analytics.transactions
Unlocking the power of AI collaboration: agents, tasks, and tools
Agents and tasks: working together as a crew
Fine-tuning your Investment Researcher
Limitations and considerations
Conclusion

MongoDB

Beyond Vectors: Augment LLM Capabilities With MongoDB Aggregation Framework

The task: uncover hidden opportunities

The solution: MongoDB's aggregation framework

Supercharge investment analysis with MongoDB and CrewAI

Unlocking the power of AI collaboration: agents, tasks, and tools

The code

MongoDB setup

Azure OpenAI setup

Web search API setup

CrewAI setup

Agents and tasks: working together as a crew

Fine-tuning your Investment Researcher

MongoDB aggregation pipeline

Preliminary check: ensuring error-free execution

Initiating the agent task execution

Complete Source Code

Example OUTPUT

Limitations and considerations

Conclusion

The future: AI-powered investment analysis

Related

Coding With Mark: Abstracting Joins & Subsets in Python

How to Use Custom Aggregation Expressions in MongoDB 4.4

Currency Analysis with Time Series Collections #2 — Simple Moving Average and Exponential Moving Average Calculation

Building Remix Applications with the MongoDB Stack

Table of Contents

1	pymongo==4.7.2
2	crewai==0.22.5
3	langchain==0.1.10
4	langchain-community
5	langchain-openai==0.0.5
6	duckduckgo-search==6.1.5

1	import os
2	import pymongo
3
4	MDB_URI = "mongodb+srv://<user>:<password>@cluster0.abc123.mongodb.net/"
5	client = pymongo.MongoClient(MDB_URI)
6	db = client["sample_analytics"]
7	collection = db["transactions"]

1	from langchain_openai import AzureChatOpenAI
2
3	AZURE_OPENAI_ENDPOINT = "https://__DEMO__.openai.azure.com"
4	AZURE_OPENAI_API_KEY = "__AZURE_OPENAI_API_KEY__"
5	deployment_name = "gpt-4-32k" # The name of your model deployment
6	default_llm = AzureChatOpenAI(
7	openai_api_version=os.environ.get("AZURE_OPENAI_VERSION", "2023-07-01-preview"),
8	azure_deployment=deployment_name,
9	azure_endpoint=AZURE_OPENAI_ENDPOINT,
10	api_key=AZURE_OPENAI_API_KEY
11	)

1	# Web Search Setup
2	from langchain.tools import tool
3	from langchain_community.tools import DuckDuckGoSearchResults
4	duck_duck_go = DuckDuckGoSearchResults(backend="news")
5
6	# Search Tool - Web Search
7	@tool
8	def search_tool(query: str):
9	"""
10	Perform online research on a particular stock.
11	"""
12	return duck_duck_go.run(query)

1	# Research Agent Setup
2	from crewai import Crew, Process, Task, Agent
3	AGENT_ROLE = "Investment Researcher"
4	AGENT_GOAL = """
5	Research stock market trends, company news, and analyst reports to identify potential investment opportunities.
6	"""
7	researcher = Agent(
8	role=AGENT_ROLE,
9	goal=AGENT_GOAL,
10	verbose=True,
11	llm=default_llm,
12	backstory='Expert stock researcher with decades of experience.',
13	tools=[search_tool]
14	)
15
16	task1 = Task(
17	description="""
18	Using the following information:
19
20	[VERIFIED DATA]
21	{agg_data}
22
23	note
24	The data represents the net gain or loss of each stock symbol for each transaction type (buy/sell).
25	Net gain or loss is a crucial metric used to gauge the profitability or efficiency of an investment.
26	It's computed by subtracting the total buy value from the total sell value for each stock.
27	[END VERIFIED DATA]
28
29	[TASK]
30	- Generate a detailed financial report of the VERIFIED DATA.
31	- Research current events and trends, and provide actionable insights and recommendations.
32
33
34	[report criteria]
35	- Use all available information to prepare this final financial report
36	- Include a TLDR summary
37	- Include 'Actionable Insights'
38	- Include 'Strategic Recommendations'
39	- Include a 'Other Observations' section
40	- Include a 'Conclusion' section
41	- IMPORTANT! You are a friendly and helpful financial expert. Always provide the best possible answer using the available information.
42	[end report criteria]
43	""",
44	agent=researcher,
45	expected_output='concise markdown financial summary of the verified data and list of key points and insights from researching current events',
46	tools=[search_tool],
47	)
48	# Crew Creation
49	tech_crew = Crew(
50	agents=[researcher],
51	tasks=[task1],
52	process=Process.sequential
53	)

1	# MongoDB Aggregation Pipeline
2	pipeline = [
3	{
4	"$unwind": "$transactions" # Deconstruct the transactions array into separate documents
5	},
6	{
7	"$group": { # Group documents by stock symbol
8	"_id": "$transactions.symbol", # Use symbol as the grouping key
9	"buyValue": { # Calculate total buy value
10	"$sum": {
11	"$cond": [ # Conditional sum based on transaction type
12	{ "$eq": ["$transactions.transaction_code", "buy"] }, # Check for "buy" transactions
13	{ "$toDouble": "$transactions.total" }, # Convert total to double for sum
14	0 # Default value for non-buy transactions
15	]
16	}
17	},
18	"sellValue": { # Calculate total sell value (similar to buyValue)
19	"$sum": {
20	"$cond": [
21	{ "$eq": ["$transactions.transaction_code", "sell"] },
22	{ "$toDouble": "$transactions.total" },
23	0
24	]
25	}
26	}
27	}
28	},
29	{
30	"$project": { # Project desired fields (renaming and calculating net gain)
31	"_id": 0, # Exclude original _id field
32	"symbol": "$_id", # Rename _id to symbol for clarity
33	"netGain": { "$subtract": ["$sellValue", "$buyValue"] } # Calculate net gain
34	}
35	},
36	{
37	"$sort": { "netGain": -1 } # Sort results by net gain (descending)
38	},
39	{"$limit": 3} # Limit results to top 3 stocks
40	]
41
42
43	results = list(collection.aggregate(pipeline))
44	client.close()
45
46	print("MongoDB Aggregation Pipeline Results:")
47	print(results)

1	MongoDB Aggregation Pipeline Results:
2	[{'netGain': 72769230.71428967, 'symbol': 'amzn'},
3	{'netGain': 39912931.04990542, 'symbol': 'sap'},
4	{'netGain': 25738882.292086124, 'symbol': 'aapl'}]

1	import os
2	import pymongo
3	import pprint
4
5	# MongoDB Setup
6	MDB_URI = "mongodb+srv://<user>:<password>@cluster0.abc123.mongodb.net/"
7	client = pymongo.MongoClient(MDB_URI)
8	db = client["sample_analytics"]
9	collection = db["transactions"]
10
11	# Azure OpenAI Setup
12	from langchain_openai import AzureChatOpenAI
13	AZURE_OPENAI_ENDPOINT = "https://__DEMO__.openai.azure.com"
14	AZURE_OPENAI_API_KEY = "__AZURE_OPENAI_API_KEY__"
15	deployment_name = "gpt-4-32k" # The name of your model deployment
16	default_llm = AzureChatOpenAI(
17	openai_api_version=os.environ.get("AZURE_OPENAI_VERSION", "2023-07-01-preview"),
18	azure_deployment=deployment_name,
19	azure_endpoint=AZURE_OPENAI_ENDPOINT,
20	api_key=AZURE_OPENAI_API_KEY
21	)
22
23	# Web Search Setup
24	from langchain.tools import tool
25	from langchain_community.tools import DuckDuckGoSearchResults
26	duck_duck_go = DuckDuckGoSearchResults(backend="news",max_results=10)
27
28	# Search Tool - Web Search
29	@tool
30	def search_tool(query: str):
31	"""
32	Perform online research on a particular stock.
33	Will return search results along with snippets of each result.
34	"""
35	print("\n\nSearching DuckDuckGo for:", query)
36	search_results = duck_duck_go.run(query)
37	search_results_str = "[recent news for: " + query + "]\n" + str(search_results)
38	return search_results_str
39
40
41	# Research Agent Setup
42	from crewai import Crew, Process, Task, Agent
43	AGENT_ROLE = "Investment Researcher"
44	AGENT_GOAL = """
45	Research stock market trends, company news, and analyst reports to identify potential investment opportunities.
46	"""
47	researcher = Agent(
48	role=AGENT_ROLE,
49	goal=AGENT_GOAL,
50	verbose=True,
51	llm=default_llm,
52	backstory='Expert stock researcher with decades of experience.',
53	tools=[search_tool]
54	)
55
56	task1 = Task(
57	description="""
58	Using the following information:
59
60	[VERIFIED DATA]
61	{agg_data}
62
63	note
64	The data represents the net gain or loss of each stock symbol for each transaction type (buy/sell).
65	Net gain or loss is a crucial metric used to gauge the profitability or efficiency of an investment.
66	It's computed by subtracting the total buy value from the total sell value for each stock.
67	[END VERIFIED DATA]
68
69	[TASK]
70	- Generate a detailed financial report of the VERIFIED DATA.
71	- Research current events and trends, and provide actionable insights and recommendations.
72
73
74	[report criteria]
75	- Use all available information to prepare this final financial report
76	- Include a TLDR summary
77	- Include 'Actionable Insights'
78	- Include 'Strategic Recommendations'
79	- Include a 'Other Observations' section
80	- Include a 'Conclusion' section
81	- IMPORTANT! You are a friendly and helpful financial expert. Always provide the best possible answer using the available information.
82	[end report criteria]
83	""",
84	agent=researcher,
85	expected_output='concise markdown financial summary of the verified data and list of key points and insights from researching current events',
86	tools=[search_tool],
87	)
88	# Crew Creation
89	tech_crew = Crew(
90	agents=[researcher],
91	tasks=[task1],
92	process=Process.sequential
93	)
94
95	# MongoDB Aggregation Pipeline
96	pipeline = [
97	{
98	"$unwind": "$transactions" # Deconstruct the transactions array into separate documents
99	},
100	{
101	"$group": { # Group documents by stock symbol
102	"_id": "$transactions.symbol", # Use symbol as the grouping key
103	"buyValue": { # Calculate total buy value
104	"$sum": {
105	"$cond": [ # Conditional sum based on transaction type
106	{ "$eq": ["$transactions.transaction_code", "buy"] }, # Check for "buy" transactions
107	{ "$toDouble": "$transactions.total" }, # Convert total to double for sum
108	0 # Default value for non-buy transactions
109	]
110	}
111	},
112	"sellValue": { # Calculate total sell value (similar to buyValue)
113	"$sum": {
114	"$cond": [
115	{ "$eq": ["$transactions.transaction_code", "sell"] },
116	{ "$toDouble": "$transactions.total" },
117	0
118	]
119	}
120	}
121	}
122	},
123	{
124	"$project": { # Project desired fields (renaming and calculating net gain)
125	"_id": 0, # Exclude original _id field
126	"symbol": "$_id", # Rename _id to symbol for clarity
127	"netGain": { "$subtract": ["$sellValue", "$buyValue"] } # Calculate net gain
128	}
129	},
130	{
131	"$sort": { "netGain": -1 } # Sort results by net gain (descending)
132	},
133	{"$limit": 3} # Limit results to top 3 stocks
134	]
135	results = list(collection.aggregate(pipeline))
136	client.close()
137
138	# Print MongoDB Aggregation Pipeline Results
139	print("MongoDB Aggregation Pipeline Results:")
140
141	pprint.pprint(results) #pprint is used to to “pretty-print” arbitrary Python data structures
142
143	# Start the task execution
144	tech_crew.kickoff(inputs={'agg_data': str(results)})

1	Thought:
2	The recent news for Apple indicates that the company's stock has reached a $3 trillion valuation, largely due to the hype surrounding the introduction of AI to iPhones. This could be a significant catalyst for Apple's future growth. Now, I have enough information to generate a detailed financial report of the verified data, including a TLDR summary, actionable insights, strategic recommendations, other observations, and a conclusion.
3
4	Final Answer:
5
6	Financial Report
7
8	TLDR Summary
9
10	Based on the verified data, the net gains for the three stocks are as follows:
11
12	1. Amazon (AMZN) - $72,769,230.71
13	2. SAP - $39,912,931.04
14	3. Apple (AAPL) - $25,738,882.29
15
16	Amazon has the highest net gain, followed by SAP and Apple.
17
18	Actionable Insights
19
20	- Amazon (AMZN): The company's stock is seen as a good buy due to its attractive valuation and significant dominance in the e-commerce market.
21	- SAP: The company is making a significant acquisition of WalkMe Ltd., which could potentially boost its value and market position.
22	- Apple (AAPL): The company's stock has reached a $3 trillion valuation, largely due to the hype surrounding the introduction of AI to iPhones. This could be a significant catalyst for Apple's future growth.
23
24	Strategic Recommendations
25
26	- Amazon (AMZN): Given its dominant position in e-commerce and attractive valuation, it might be a good idea to consider increasing investments in Amazon.
27	- SAP: Considering the potential value boost from the recent acquisition, investors might want to keep a close watch on SAP's performance and consider it for their portfolio.
28	- Apple (AAPL): With the hype around the introduction of AI to iPhones, Apple's stock could see significant growth. It might be a good time to invest or increase existing investments.
29
30	Other Observations
31
32	The companies have seen fluctuations in their stock prices but generally perform well. The current trends and developments indicate potential for further growth.
33
34	Conclusion
35
36	Given the net gains and recent developments, Amazon, SAP, and Apple seem to be promising investments. However, as with any investment decision, it's important to consider individual financial goals, risk tolerance, and market conditions. It's always recommended to conduct further research or consult with a financial advisor before making investment decisions.
37
38	This report provides a high-level overview of the current events and trends impacting these stocks, but the rapidly changing market environment necessitates regular monitoring and analysis of investment portfolios.
39
40	> Finished chain.