`$hoppingWindow`

On this page

Definition

Syntax
Behavior
Examples

Definition

The $hoppingWindow stage specifies a hopping window for aggregation of data. Atlas Stream Processing windows are stateful, can be recovered if interrupted, and have mechanisms for processing late-arriving data. You must apply all other aggregation queries to your streaming data within this window stage.

$hoppingWindow

A $hoppingWindow pipeline stage has the following prototype form:

{
  "$hoppingWindow": {
    "interval": {
      "size": <int>,
      "unit": "<unit-of-time>"
    },
    "hopSize": {
      "size": <int>,
      "unit": "<unit-of-time>"
    },
    "pipeline" : [
      <aggregation-stage-array>
    ],
    "offset": {
      "offsetFromUtc": <int>,
      "unit": "<unit-of-time>"
    },
    "idleTimeout": {
      "size": <int>,
      "unit": "<unit-of-time>"
    },
    "allowedLateness": {
      "size": <int>,
      "unit": "<unit-of-time>"
    },
  }
}

Syntax

The $hoppingWindow stage takes a document with the following fields:

Field	Type	Necessity	Description
`interval`	document	Required	Document specifying the interval of a hopping window as a combination of a size and a unit of time where: The value of `size` must be a non-zero positive integer. The value of `unit` can be one of the following: `"ms"` (millisecond) `"second"` `"minute"` `"hour"` `"day"` For example, a `size` of `20` and a `unit` of `second` sets each window to remain open for 20 seconds.
`hopSize`	document	Required	Document that specifies the length of the hop between window start times as a combination of a `size` and a `unit` of time where: The value of `size` must be a non-zero positive integer. The value of `unit` can be one of the following: `"ms"` (millisecond) `"second"` `"minute"` `"hour"` `"day"` For example, a `size` of `10` and a `unit` of `second` defines a 10-second hop between window start times.
`pipeline`	array	Required	Nested aggregation pipeline evaluated against the messages within the window.
`offset`	document	Optional	Document specifying a time offset for window boundaries relative to UTC. The document is a combination of the size field `offsetFromUtc` and a unit of time where: The value of `offsetFromUtc` must be a non-zero positive integer. The value of `unit` must be one of the following: `"ms"` (millisecond) `"second"` `"minute"` `"hour"` For example, an `offsetFromUtc` of `8` and a `unit` of `hour` generates boundaries that are shifted eight hours ahead of UTC. If you do not specify an offset, the window boundaries align with UTC.
`idleTimeout`	document	Optional	Document specifying how long to wait before closing windows if `$source` is idle. Define this setting as a combination of a `size` and a `unit` of time where: The value of `size` must be a non-zero positive integer. The value of `unit` can be one of the following: `"ms"` (millisecond) `"second"` `"minute"` `"hour"` `"day"` If you set `idleTimeout`, Atlas Stream Processing closes open windows only if `$source` is idle for longer than the greater of either the remaining window time or `idleTimeout` time. The idle timer starts as soon as `$source` goes idle. For example, consider a 12:00 pm to 1:00 pm window and `idleTimeout` time of 2 hours. If the last event is at 12:02 pm after which `$source` goes idle, the remaining window time is 58 minutes. Atlas Stream Processing closes the window after 2 hours of idleness at 2:02 pm, which is longer than the remaining window time and the `idleTimeout` time. If the `idleTimeout` time is set to only 10 minutes, Atlas Stream Processing closes the window after 58 minutes of idleness at 1:00 pm, which is longer than `idleTimeout` time and the remaining window time.
`allowedLateness`	document	Optional	Document that specifies how long to keep windows generated from the source open to accept late-arriving data after processing documents for window end time. If omitted, defaults to 3 seconds.

Behavior

Atlas Stream Processing supports only one window stage per pipeline.

When you apply the $group stage to your window stage, a single group key has a limit of 100 megabytes of RAM.

Support for certain aggregation stages might be limited or unavailable within windows. To learn more, see Supported Aggregation Pipeline Stages.

In the event of a service interruption, you can resume the internal pipeline of a window from its state at the point of interruption. To learn more, see Checkpoints.

Examples

A streaming data source generates detailed weather reports from various locations, conformant to the schema of the Sample Weather Dataset. The following aggregation has three stages:

The $source stage establishes a connection with the Apache Kafka broker collecting these reports in a topic named my_weatherdata, exposing each record as it is ingested to the subsequent aggregation stages.
The $hoppingWindow stage defines overlapping windows of time that are 100 seconds in duration, and which begin every 20 seconds. Each window executes an internal pipeline which finds the average liquidPrecipitation.depth, as defined in the sample_weatherdata documents streamed from the Apache Kafka broker, for the duration of a given window. The pipeline then outputs a single document with an _id equivalent to the start timestamp of the window it represents and the averagePrecipitation for that window.
The $merge stage writes the output to an Atlas collection named stream in the sample_weatherstream database. If no such database or collection exist, Atlas creates them.

pipeline = [
 { $source:
     {
         "connectionName": "streamsExampleConnectionToKafka",
         "topic": "my_weatherdata"
     }
 },
 { $hoppingWindow:
     {
       "interval": {
         "size": 100,
         "unit": "second"
       },
       "hopSize": {
         "size": 20,
         "unit": "second"
       },
       "pipeline" : [
         {
             $group: {
                 // The resulting document's _id is the $hoppingWindow's start timestamp
                 _id: "$_stream_meta.window.start",
                 averagePrecipitation: { $avg: "$liquidPrecipitation.depth" }
             }
         }
       ],
     }
 },
 { $merge:
     {
         "into":
         {
             "connectionName":"streamsExampleConnectionToAtlas",
             "db":"streamDB",
             "coll":"streamCollection"
         }
     }
  }
]

To view the documents in the resulting sample_weatherstream.stream collection, connect to your Atlas cluster and run the following command:

  db.getSiblingDB("sample_weatherstream").stream.find()

{
  _id: ISODate('2024-08-28T19:30:20.000Z'),
  _stream_meta: {
    source: { type: 'kafka' },
    window: {
      start: ISODate('2024-08-28T19:30:20.000Z'),
      end: ISODate('2024-08-28T19:32:00.000Z')
    }
  },
  averagePrecipitation: 2264.3973214285716
},
{
  _id: ISODate('2024-08-28T19:30:40.000Z'),
  _stream_meta: {
    source: { type: 'kafka' },
    window: {
      start: ISODate('2024-08-28T19:30:40.000Z'),
      end: ISODate('2024-08-28T19:32:20.000Z')
    }
  },
  averagePrecipitation: 2285.7061611374406
},
{
  _id: ISODate('2024-08-28T19:31:00.000Z'),
  _stream_meta: {
    source: { type: 'kafka' },
    window: {
      start: ISODate('2024-08-28T19:31:00.000Z'),
      end: ISODate('2024-08-28T19:32:40.000Z')
    }
  },
  averagePrecipitation: 2357.6940154440153
},
{
  _id: ISODate('2024-08-28T19:31:20.000Z'),
  _stream_meta: {
    source: { type: 'kafka' },
    window: {
      start: ISODate('2024-08-28T19:31:20.000Z'),
      end: ISODate('2024-08-28T19:33:00.000Z')
    }
  },
  averagePrecipitation: 2378.374061433447
}

Note

The preceding is a representative example. Streaming data are not static, and each user sees distinct documents.

Back

$lookup

$tumblingWindow