2 / 3
Jun 2024

Hello everyone,

I’m currently developing a database with pymongo to store wind information for every latitude and longitude on Earth at a resolution of 0.1 degrees, across 75 different altitudes, for each month of the year. Each data point consists of two int64 values, representing the wind speed in the U and V directions.

The database will encompass:

  • Latitude: -90.0 to 90.0
  • Longitude: -180.0 to 180.0
  • Altitudes: Ranging from 15,000 to 30,000 in 75 intervals
  • Months: January to December
  • Hours: 00 to 23

For efficient querying, I’ve structured the indexes as follows:

  • Month (1-12)
  • Hour (0-23)
  • Wind Direction (U or V)
  • Longitude, split into 10 sections
  • Latitude, split into 10 sections
  • Altitude

Each index combination points to an array called ‘geopoints’ that holds geojson objects for the specific indexed combination, resulting in approximately 180x360 points per object.

Given the scale of the data (roughly 2.7 trillion elements), I’m encountering significant efficiency issues. I would greatly appreciate any suggestions or insights on how to optimize this setup. Are there more effective ways to structure or store this vast amount of data?

Thank you for your help!

Hey thank you for the reply. Here is what the schema looks like where the empty array should hold up to 180x360 elements, where each element is a geojson object.

image