IoT and MongoDB: Powering Time Series Analysis of Household Power Consumption

Nenad Milosavljevic6 min read • Published Aug 28, 2024 • Updated Aug 28, 2024

JavaScript Atlas

Rate this tutorial

IoT (Internet of Things) systems are increasingly becoming a part of our daily lives, offering smart solutions for homes and businesses.

This article will explore a practical case study on household power consumption, showcasing how MongoDB's time series collections can be leveraged to store, manage, and analyze data generated by IoT devices efficiently.

Time series collections

Time series collections in MongoDB effectively store time series data — a sequence of data points analyzed to observe changes over time.

Time series collections provide the following benefits:

Reduced complexity for working with time series data
Improved query efficiency
Reduced disk usage
Reduced I/O for read operations
Increased WiredTiger cache usage

Generally, time series data is composed of the following elements:

The timestamp of each data point
Metadata (also known as the source), which is a label or tag that uniquely identifies a series and rarely changes
Measurements (also known as metrics or values), representing the data points tracked at increments in time — generally key-value pairs that change over time

Case study: household electric power consumption

This case study focuses on analyzing the data set with over two million data points of household electric power consumption, with a one-minute sampling rate over almost four years.

The dataset includes the following information:

date: Date in format dd/mm/yyyy
time: Time in format hh:mm
global_active_power: Household global minute-averaged active power (in kilowatt)
global_reactive_power: Household global minute-averaged reactive power (in kilowatt)
voltage: Minute-averaged voltage (in volt)
global_intensity: Household global minute-averaged current intensity (in ampere)
sub_metering_1: Energy sub-metering No. 1 (in watt-hour of active energy); corresponds to the kitchen, containing mainly a dishwasher, an oven, and a microwave (hot plates are not electric but gas-powered)
sub_metering_2: Energy sub-metering No. 2 (in watt-hour of active energy); corresponds to the laundry room, containing a washing machine, a tumble drier, a refrigerator, and a light.
sub_metering_3: Energy sub-metering No. 3 (in watt-hour of active energy); corresponds to an electric water heater and an air conditioner

Schema modeling

To define and model our time series collection, we will use the Mongoose library. Mongoose, an Object Data Modeling (ODM) library for MongoDB, is widely used in the Node.js ecosystem for its ability to provide a straightforward way to model our application data.

The schema will include:

timestamp: A combination of the “date” and “time” fields from the dataset.
global_active_power: A numerical representation from the dataset.
global_reactive_power: A numerical representation from the dataset.
voltage: A numerical representation from the dataset.
global_intensity: A numerical representation from the dataset.
sub_metering_1: A numerical representation from the dataset.
sub_metering_2: A numerical representation from the dataset.
sub_metering_3: A numerical representation from the dataset.

To configure the collection as a time series collection, an additional “timeseries” configuration with “timeField” and “granularity” properties is necessary. The “timeField” will use our schema’s “timestamp” property, and “granularity” will be set to “minutes” to match the dataset's sampling rate.

Additionally, an index on the “timestamp” field will be created to enhance query performance — note that you can query a time series collection the same way you query a standard MongoDB collection.

The resulting schema is structured as follows:

1 const { Schema, model } = require('mongoose');
2 
3 const powerConsumptionSchema = new Schema(
4     {
5         timestamp: { type: Date, index: true },
6         global_active_power: { type: Number },
7         global_reactive_power: { type: Number },
8         voltage: { type: Number },
9         global_intensity: { type: Number },
10         sub_metering_1: { type: Number },
11         sub_metering_2: { type: Number },
12         sub_metering_3: { type: Number },
13     },
14     {
15         timeseries: {
16             timeField: 'timestamp',
17             granularity: 'minutes',
18         },
19     }
20 );
21 
22 const PowerConsumptions = model('PowerConsumptions', powerConsumptionSchema);
23 
24 module.exports = PowerConsumptions;

For further details on creating time series collections, refer to MongoDB's official time series documentation.

Inserting data to MongoDB

The dataset is provided as a .txt file, which is not directly usable with MongoDB. To import this data into our MongoDB database, we need to preprocess it so that it aligns with our database schema design.

This can be accomplished by performing the following steps:

Connect to MongoDB.
Load data from the .txt file.
Normalize the data and split the content into lines.
Parse the lines into structured objects.
Transform the data to match our MongoDB schema model.
Filter out invalid data.
Insert the final data into MongoDB in chunks.

Here is the Node.js script that automates these steps:

1 // Load environment variables from .env file
2 require('dotenv').config();
3 
4 // Import required modules
5 const fs = require('fs');
6 const mongoose = require('mongoose');
7 const PowerConsumptions = require('./models/power-consumption');
8 
9 // Connect to MongoDB and process the data file
10 const processData = async () => {
11     try {
12         // Connect to MongoDB using the connection string from environment variables
13         await mongoose.connect(process.env.MONGODB_CONNECTION_STRING);
14 
15         // Define the file path for the data source
16         const filePath = 'Household_Power_Consumption.txt';
17 
18         // Read data file
19         const rawFileContent = fs.readFileSync(filePath, 'utf8');
20 
21         // Normalize line endings and split the content into lines
22         const lines = rawFileContent.replace(/\r\n/g, '\n').replace(/\r/g, '\n').trim().split('\n');
23 
24         // Extract column headers
25         const headers = lines[0].split(';').map((header) => header.trim());
26 
27         // Parse the lines into structured objects
28         const parsedRecords = lines.slice(1).map((line) => {
29             const values = line.split(';').map((value) => value.trim());
30             return headers.reduce((object, header, index) => {
31                 object[header] = values[index];
32                 return object;
33             }, {});
34         });
35 
36         // Transform and prepare data for insertion
37         const transformedRecords = parsedRecords.map((item) => {
38             const [day, month, year] = item.Date.split('/').map((num) => parseInt(num, 10));
39             const [hour, minute, second] = item.Time.split(':').map((num) => parseInt(num, 10));
40             const dateObject = new Date(year, month - 1, day, hour, minute, second);
41 
42             return {
43                 timestamp: dateObject.toISOString(),
44                 global_active_power: parseFloat(item.Global_active_power),
45                 global_reactive_power: parseFloat(item.Global_reactive_power),
46                 voltage: parseFloat(item.Voltage),
47                 global_intensity: parseFloat(item.Global_intensity),
48                 sub_metering_1: parseFloat(item.Sub_metering_1),
49                 sub_metering_2: parseFloat(item.Sub_metering_2),
50                 sub_metering_3: parseFloat(item.Sub_metering_3),
51             };
52         });
53 
54         // Filter out invalid data
55         const finalData = transformedRecords.filter(
56             (item) =>
57             item.timestamp !== 'Invalid Date' &&
58             !isNaN(item.global_active_power) &&
59             !isNaN(item.global_reactive_power) &&
60             !isNaN(item.voltage) &&
61             !isNaN(item.global_intensity) &&
62             !isNaN(item.sub_metering_1) &&
63             !isNaN(item.sub_metering_2) &&
64             !isNaN(item.sub_metering_3)
65         );
66 
67         // Insert final data into the database in chunks of 1000
68         const chunkSize = 1000;
69         for (let i = 0; i < finalData.length; i += chunkSize) {
70             const chunk = finalData.slice(i, i + chunkSize);
71             await PowerConsumptions.insertMany(chunk);
72         }
73 
74         console.log('Data processing and insertion completed.');
75     } catch (error) {
76         console.error('An error occurred:', error);
77     }
78 };
79 
80 // Call the processData function
81 processData();

Before you start the script, you need to make sure that your environment variables are set up correctly. To do this, create a file named “.env” in the root folder, and add a line for “MONGODB_CONNECTION_STRING”, which is your link to the MongoDB database.

The content of the .env file should look like this:

1 MONGODB_CONNECTION_STRING = 'mongodb+srv://{{username}}:{{password}}@{{your_cluster_url}}/{{your_database}}?retryWrites=true&w=majority'

For more details on constructing your connection string, refer to the official MongoDB documentation.

Visualization with MongoDB Atlas Charts

Once the data has been inserted into our MongoDB time series collection, MongoDB Atlas Charts can be used to effortlessly connect to and visualize the data.

In order to connect and use MongoDB Atlas Charts, we should:

Establish a connection to the time series collection as a data source.
Associate the desired fields with the appropriate X and Y axes.
Implement filters as necessary to refine the data displayed.
Explore the visualizations provided by Atlas Charts to gain insights.

In the above example, we visualized the power consumption from various sources within a single day. The visualization revealed distinct usage patterns: Kitchen equipment was primarily used in the morning and evening, laundry room equipment was active around noon, and the water heater and air conditioner showed continuous use from morning to evening.

For the displayed visualization, we used a query to filter the data for a specific date:

1 { timestamp: { $gt: ISODate('2007-01-17T00:00:00.000-00:00'), $lt: ISODate('2007-01-18T00:00:00.000-00:00') } }

If you want to change what is shown in the charts, you can apply different filters or aggregation pipelines to the data, tailoring the results according to your needs.

Conclusion

This article demonstrates the powerful capabilities of MongoDB when integrated with IoT systems. By leveraging MongoDB's time series collection, we can efficiently store, manage, and analyze the large volumes of time-series data generated by IoT devices.

The case study on household power consumption not only showcases the practical applications of IoT in our daily lives but also highlights how MongoDB can help us get a deeper understanding of IoT data sets.

Through visualization with MongoDB Atlas Charts, we have gained significant insights into power consumption patterns. This not only helps in making informed decisions but also opens the door for significant improvements in energy efficiency and cost savings.

As we have explored the capabilities of MongoDB in handling IoT data and visualizing it with Atlas Charts, I hope it gets you excited to work more on your own data projects. I invite you to join the MongoDB Community Forums to share your experiences, ask questions, and collaborate with fellow enthusiasts. Whether you are seeking advice, sharing your latest project, or exploring innovative uses of MongoDB, the community is a great place to continue the conversation.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Article

How to Build Serverless Applications with SST and MongoDB Atlas

Aug 29, 2024 | 3 min read

Tutorial

Building an Advanced RAG System With Self-Querying Retrieval

Sep 12, 2024 | 21 min read

Article

Atlas Data Lake SQL Integration to Form Powerful Data Interactions

Jun 12, 2023 | 3 min read

News & Announcements

RAG Made Easy with MongoDB Atlas and Azure OpenAI

Dec 10, 2024 | 4 min read

Time series collections
Case study: household electric power consumption
Schema modeling
Inserting data to MongoDB
Visualization with MongoDB Atlas Charts
Conclusion

Atlas