Alert Monitors

Use monitors for automatic notifications of conditions in your log and event data.

A monitor is a definition of an important condition or behavior that you want to watch for in your log and event files indexed by ChaosSearch. You can define a condition using a query DSL definition or a visual graph.

A monitor requires at least one associated trigger to define the action to take when an alert is raised. When partnered with a trigger and a destination, the monitor can be enabled to watch for the defined condition and to raise an alert that sends a notification.

Creating a Monitor

To create a monitor:

  1. In Search Analytics > Alerting, click the Monitors tab.

The Monitors page opens with a list of the currently defined monitors. The page includes a summary of information such as whether monitors are enabled or disabled, their last update, and information about related alerts and errors. From this page you can perform actions such as acknowledging alerts and managing monitor definitions.

  1. Click Create monitor. The Create Monitor window opens.

  1. In the Monitor name field, type a name for the monitor.
  2. In the Monitor defining method field, select how to you want to define the monitor:
  • Select Visual editor to create a monitor that watches for when a value is above or below a threshold for a period of time.
  • Select Extraction query editor to use Elasticsearch query DSL to specify the conditions to watch for.
  1. In the Schedule field, configure how frequently you want to run the monitor to run to check for the condition. You can select a number and a time unit (minutes, hours, or days).
  2. In the Data source section, in the Index field, select the ChaosSearch Refinery view to use with the monitor.
  3. The information to specify for a visual query and an extraction query method is different; see the following sections Using an Extraction Query or Using a Visual Editor for details about each type of method.
  4. Specify one or more triggers for the monitor so that it can be enabled. See Define One or More Triggers for a Monitor later in this topic.
  5. Click Create to save the new monitor.

Review the following sections for more details about the monitor options.

Using an Extraction Query

If you select the Extraction query editor method to define a monitor, as in the following example, the window updates to show the fields that you must define for the monitor.

  1. In the Data Source > Index field, select the Refinery view to use with the monitor. The window displays a Query section with two fields. The left column Define extraction query defaults to a full match_all query DSL operation. The right column Extraction query response is initially empty.

  1. Click Run to populate the column on the right. The response in the right column produces the values from the selected view that could be added to the extraction query that you build in the left column.

  1. Specify your query DSL extraction query in the left column to define the monitoring condition. Note that the query currently supports the query{} section syntax. The aggs{} syntax is not yet supported. For more information about the query syntax, see Elasticsearch API Support.

As an example, the following query DSL searches for a domain value in the log and event files from the last 15 minutes:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "match_phrase": {
                        "domain": {
                            "query": "test.domain.com"
                        }
                    }
                },
                {
                    "range": {
                        "@timestamp": {
                            "gte": "now-15m"                        }
                    }
                }
            ]
        }
    }
}

Another simple query DSL example for a specific field match follows:

Using a Visual Editor

If you select the Visual editor method for a monitor definition, as in the following example, the monitor window updates to show new fields for the visualization.

  1. In the Data Source > Index field, select the Refinery view to use with the monitor. The window displays new fields for the visualization details.
  2. In the Time field, select the time column that you want to use for the X axis date histogram. Click the field to select one of the timeval fields in the view.
  3. In the Query pane, the default metric is a count of documents (in ChaosSearch, that means a hits/records count). Click Add metric to add a specific metric from the view to visualize instead of the hits count.
  4. In the Time range for the last field, specify the monitoring time range. For example, to check the last 1 hour of the log and event files related to the view that you selected, select 1 from the number drop-down, and hour(s) from the units drop-down list. The available units are minutes, hours, or days.
  5. In the Data filter field, you can specify one optional filter rule to apply to the monitoring query.
  6. In the Group by field, you can specify one optional group by rule to group the results of the monitor query.
  7. Click Preview query and performance. The window updates to display a graph area, and populates the data for the graph within the time range you specified. The chart information also shows the monitor duration, runtime, and hits for the monitor query.

If a visual graph does not appear, there might not be data for your selected time period. Try adjusting the time range to a value where data is expected (such as last hour, or last 12 hours) just as an example. Note that when testing alerts against static data, the time range could be many days in the past. As a best practice, configure alerts using live index views that will have new indexed data to evaluate.

🚧

Monitor Permissions and Users

When you create and save a monitor, the monitor definition is updated with the information for the ChaosSearch groups associated with your user account. Use caution when reviewing monitor definitions, because saving a monitor as a different user could break the monitor. If RBAC group assignments change, or if permissions assigned to the RBAC groups used for a monitor change, the monitor might not work after those updates.

See the troubleshooting section for more information.

Tuning the Monitor Schedule and Analysis Window

While planning for the monitor, be sure to consider the monitor schedule interval and the monitor time range to ensure that your monitors have the data that they need to query for the conditions that you want alerts for, and that alerts are raised in a reasonable timeframe for investigation and resolution.

For example, a monitor that is scheduled to run every 10 minutes, and that has a 10-minute monitoring time range, will query the available data with timestamps in that 10-minute time range. For a variety of reasons, it is possible that some indexed data for that time range might not be available for querying until after the monitor runs. This gap could result in missed alerts depending on the monitor's configuration.

In this example, when the monitor runs at 9:10, it queries the available data with a timestamp of 9:00 to 9:10, but that data does not raise an alert. However, at 9:11, some newly indexed data with a 9:09 timestamp is made available, and that data would trigger an alert when queried by the monitor. However, the data wasn't available in time for the 9:10 run, and the 9:20 run queries data in the time range from 9:10 to 9:20, so the 9:09 data and alert is missed.

As one way to prevent missed alerts, you can overlap the monitor window time range so that the next monitor examines some of the previous monitor's time range to catch data that might have arrived after the last monitor ran. Still assuming a monitor with a 10-minute schedule, you could define an overlapping monitor time range of 15 minutes as follows:

{
  "range": {
    "timestamp": {
      "gt": "now-15m",
      "lt": "now"
    }
  }
}

In this example, the monitor that runs at 9:10 queries data in the 8:55 to 9:10 range but would not see the data that arrives at 9:11 with a 9:09 timestamp. The 9:20 monitor queries the 9:05 to 9:20 range, which includes the new 9:09 data, and now raises an alert. This is the recommended way to set up time ranges for monitors if there is some risk that your monitors could run before all the data is available for the monitor time range. Note that overlapping time ranges could cause duplicate alerts if the same condition is detected by consecutive runs (i.e., both 9:10 and 9:20).

As an alternative to the overlapping time ranges, you could offset the monitor time range to configure a monitor that queries a slightly older time range, increasing the chance that all data is available for that query. In this example, the monitor uses a 10-minute time range, and will query the data for the range of 15-to-5 minutes before the monitor runs:

{
  "range": {
    "timestamp": {
      "gt": "now-15m",
      "lt": "now-5m"
    }
  }
}

In this example, when the monitor runs at 9:10, it queries data in the 8:55 to 9:05 time range, and ignores any data in the offset range of 9:05 to 9:10. The monitor at 9:20 queries data in the 9:05 to 9:15 range, and raises an alert for the 9:09 data. Note that the offset time range method can result in delays to raise alerts; any available data within the offset time range (e.g., from 9:15 to 9:20) will not be queried--and alerts will not be raised--until the next scheduled time that the monitor runs at 9:30.

Define One or More Triggers for a Monitor

A monitor will not run unless there is at least one trigger defined to that monitor.

To define a trigger:

  1. When you are creating or editing a monitor, scroll to the bottom of the window to the Triggers section.

  1. Click Add trigger to open a panel with the fields for a new trigger.

  1. In the Trigger name field, type a name for the trigger.
  2. In the Severity level field, specify an alert severity level from 1 (Highest) to 5 (Lowest).
  3. In the Trigger condition field, specify any conditions for the trigger to fire. The default trigger rule is that the monitor query must return at least one hit/result. Click Info for more information about the scripting variables.
  1. In the Actions panel, define one or more actions to take when a trigger condition is met.
    • In the Action name field, type a name for the action.
    • In the Destinations list, select a destination to which the alert is sent.
    • In the Message subject field, type a clear message that will be sent in the alert.
    • In the Message field, update the content as needed to provide helpful information to the alert system user about the monitoring condition and problem.
    • Click Preview message to display a same event message based on the information that you just specified.
    • Under Throttling, you can select Enable action throttling if you want to limit the number of notifications you receive within a given time frame. Another field appears where you can specify the throttle timeframe of 1 to 1440 minutes.
      If a monitor checks a trigger condition every minute, you could receive one notification per minute. If you set action throttling to 60 minutes, you receive no more than one notification per hour, even if the trigger condition is met dozens of times during that hour.
  2. You can choose to add another action if desired, for up to 10 actions on the monitor. For example, you might have one trigger to send a low priority alert if data crosses an early-warning threshold, and another trigger for a higher priority alert when data crosses a more severe higher level.
  3. When you finish specifying the trigger(s) for the monitor, create or update the monitor.