Tuning the Monitor Schedule and Analysis Window

Helpful notes for choosing the monitor schedule interval and monitoring time range

While planning for the monitor, be sure to consider the monitor schedule interval and the monitor time range to ensure that your monitors have the data that they need to query for the conditions that you want alerts for, and that alerts are raised in a reasonable timeframe for investigation and resolution.

Monitors can run only one query at a time. Be sure to construct queries that will run in less time than the schedule interval of the monitor. If a monitor is scheduled to run using a 1-minute interval, for example, the monitor query should be tuned to run in less than 45 seconds. If a monitor query is still running when the next interval is due, the new query is skipped for that interval and the monitor will attempt the query at the next interval.

Monitors operate only on the indexed data that is available at the time that the monitor query runs. For example, a monitor that is scheduled to run every 10 minutes, and that has a 10-minute monitoring time range, will query the available data that has timestamps in that 10-minute time range. For a variety of reasons, it is possible that some indexed data for that time range is added after the monitor runs. This gap could result in missed alerts depending on the monitor's configuration.

In this example, when the monitor runs at 9:10, it queries the available data with a timestamp of 9:00 to 9:10, but an alert is not detected. However, at 9:11, some newly indexed data with a 9:09 timestamp is made available, and that data would trigger an alert when queried by the monitor. However, the data wasn't available in time for the 9:10 run, and the 9:20 run queries data in the time range from 9:10 to 9:20, so the newly added 9:09 data and alert condition is missed.

As one way to prevent missed alerts, you can overlap the monitor window time range so that the next monitor examines some of the previous monitor's time range to catch data that might have arrived after the last monitor ran. Still assuming a monitor with a 10-minute schedule, you could define an overlapping monitor time range of 15 minutes as follows:

{
  "range": {
    "timestamp": {
      "gt": "now-15m",
      "lt": "now"
    }
  }
}

In this example, the monitor that runs at 9:10 queries data in the 8:55 to 9:10 range but would not see the data that arrives at 9:11 with a 9:09 timestamp. The 9:20 monitor queries the 9:05 to 9:20 range, which includes the new 9:09 data, and now raises an alert. This is the recommended way to set up time ranges for monitors if there is some risk that your monitors could run before all the data is available for the monitor time range. Note that overlapping time ranges could cause duplicate alerts if the same condition is detected by consecutive runs (i.e., both 9:10 and 9:20).

As an alternative to the overlapping time ranges, you could offset the monitor time range to configure a monitor that queries a slightly older time range, increasing the chance that all data is available for that query. In this example, the monitor uses a 10-minute time range, and will query the data for the range of 15-to-5 minutes before the monitor runs:

{
  "range": {
    "timestamp": {
      "gt": "now-15m",
      "lt": "now-5m"
    }
  }
}

In this example, when the monitor runs at 9:10, it queries data in the 8:55 to 9:05 time range, and ignores any data in the offset range of 9:05 to 9:10. The monitor at 9:20 queries data in the 9:05 to 9:15 range, and raises an alert for the 9:09 data. Note that the offset time range method can result in delays to raise alerts; any available data within the offset time range (e.g., from 9:15 to 9:20) will not be queried--and alerts will not be raised--until the next scheduled time that the monitor runs at 9:30.