As you use ChaosSearch and its Search Analytics interface to search and visualize the data from your log and event files, you typically find see conditions such as a peak or a drop in a value, an error message, or a behavior that service administrators might want to investigate.
Alerting is a Search Analytics feature that automates the detection of these types of important conditions and "pushes" an alert to associated application/service managers or troubleshooting personnel. You can send notifications to commonly used tools like a Slack channel or a team's monitoring tool using a custom webhook interface. When you apply alerting monitors to Live Index groups, you can configure the system to watch for and send an alert when conditions are detected.
You use the ChaosSearch Search Analytics > Alerts page to review a list of all alerts for detected conditions, and to manage the monitors, triggers, and destinations that configure the rules. You can also acknowledge an alert to show other ChaosSearch users that someone is investigating the alert.
The alerting process requires you to configure and manage the following resources:
Monitors that define a condition or behavior that you want to watch for and to be notified about. You can define a monitor with an extraction query or a visual graph.
Triggers that specify a threshold, how frequently to run the check, and an associated destination for the notification. A monitor must have at least one trigger to be enabled, and could have up to 10 triggers to define special conditions with specific priority or destinations. When a monitor condition is detected, an alert enters the Active state and a notification is sent.
Destinations that define a location to which an alert message is sent when triggered. You can send messages to a Slack channel, Amazon Chime, or to a designated application via a custom webhook.
Alerts are a useful tool, but it can be easy to over-configure and check for too many alerts and conditions. As part of the alert planning, be sure to consider:
- The actionable problem conditions and the frequency to check for them
- The correct destination to notify appropriate personnel
- The message information and severity to help to prioritize and respond
A well-structured alerting plan can help teams to respond more quickly to conditions, use their troubleshooting resources more efficiently, and improve end-user experience with faster detection and time-to-restoration.
Avoid the Alert Firehose Effect
As a good practice, start with a smaller set of very specific alert conditions, then refine and grow the cases over time as you learn from the behaviors and conditions that are detectable from your log and event data. Tune the alerts and destinations for the users that benefit most from those messages, the conditions that are most meaningful, and with messages that offer a helpful summary of the condition.
For assistance with your alert configuration, contact your ChaosSearch Customer Success representative.
Updated 18 days ago