Elasticsearch API Support

ChaosSearch includes support for the Elasticsearch API.

This topic provides an overview of the Elasticsearch API support as another interface in addition to the OpenSearch Dashboards UI. APIs include query support for search and some metric and bucket aggregations.

👍

Important Search and Filter Notes

In ChaosSearch, search terms and phrases are almost always case-sensitive, unless the Refinery view is configured as a case-insensitive view. (Case-insensitive views are not recommended for performance reasons.)

ChaosSearch also leverages some Elasticsearch API options as default settings for performance (for example, wildcard brackets are required to find the search phrase anywhere in a target field in match_phrase queries). Otherwise, the search matches only the records with that exact search phrase or term as the field value.

The multi-search operation using _msearch (v7.10) lets you bundle multiple search requests into a single request. The searches run in parallel, so you can receive a response more quickly compared to sending one request per search. Each search runs independently, so the failure of one does not affect the others.

Search Query Clauses

The following sections provide an overview of the supported Elasticsearch Query DSL that can be used with ChaosSearch. The supported syntax is a subset of the available API, and these sections offer some notes and considerations for use. (More information about the supported syntax is also available in the Elasticsearch API documentation, be sure to avoid any syntax not listed below.) The query clauses can also be passed into the Search Analytics > Discover UI with the Add filter dialog.

Match All Query

The basic match_all query follows, which returns all results in the Refinery view:

{
  "query": {
    "match_all": {
    }
  }
}

Bool

The bool query combines multiple search queries with boolean logic. You can use boolean logic between queries to either narrow or broaden your search results. Bool supports must, must_not, should, and filter clauses.

{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user.id" : "user" }
      },
      "filter": {
        "term" : { "tags" : "tagval" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tags" : "env1" } },
        { "term" : { "tags" : "deployed" } }
      ]
    }
  }
}

Fields can be provided as a single element or an array. An example follows:

{
  "query": {
    "bool": {
      "must_not": {
        "term": {
          "Records.eventName": "DescribeVpcs"
        }
      }
    }
  }
}

MatchPhrase

Use match_phrase with a search string to perform a full text search against the given fieldName. Note that in Elastic environments, a match_phrase query defaults to match records that have the specified phrase anywhere inside the search field. With ChaosSearch, you must explicitly bracket search phrases or terms with wildcards/asterisks to search for that term somewhere in a field. Otherwise, the search matches only the records with the exact search phrase or term in the field.

If you specify only a starting or ending asterisk, the matching records will either begin or end (respectively) with the specified string value in the target field.

{
  "query": {
    "match_phrase": {
      "fieldName": "*search phrase*"
    }
  }
}

match_phrase accepts number values and IP addresses. For example:

To return records where o_orderkey is exactly 1124:
{
  "query": {
    "match_phrase": {
      "o_orderkey": "1124"
    }
  }
}

or to return records where the o_orderkey field is a value that begins with 112:

{
  "query": {
    "match_phrase": {
      "o_orderkey": "112*"
    }
  }
}

or to return records where an ipaddress field has a specific address:

{
  "query": {
    "match_phrase": {
      "ipaddress": "10.85.77.211"
    }
  }
}

match_phrase can be used to return records with fields that are null.

{
  "query": {
    "match_phrase": {
      "Records.requestParameters": null
    }
  }
}

Match

In ChaosSearch, the match query behaves similarly to the match_phrase query behavior. See the notes for match_phrase.

{
  "query": {
    "match": {
      "Records.requestParameters": "*template*"
    }
  }
}

MultiMatch

The multi_match query performs a match across one or more fields specified in a comma-separated array. For the query string, see the match_phrase notes on wildcard usage and implications with ChaosSearch.

{
  "query": {
    "multi_match": {
      "fields": [ "field1", "field2" ],
      "query": "*string*"
    }
  }
}

For example:

{
  "query": {
    "multi_match": {
      "fields": [
        "Records.eventName",
        "Records.requestParameters"
      ],
      "query": "*filterSet*"
    }
  }
}

Nested

The nested query wraps another query to search nested fields. The nested query searches nested field objects as if they were indexed as separate documents. If an object matches the search, the nested query returns the root parent document.

{
  "query": {
    "nested": {
      "path": "obj1",
      "query": {
        "bool": {
          "must": [
            { "match": { "obj1.name": "blue" } },
            { "range": { "obj1.count": { "gt": 5 } } }
          ]
        }
      },
      "score_mode": "avg"
    }
  }
}

For example:

{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "Records.requestParameters",
            "query": {
              "bool": {
                "should": [
                  {
                    "match": {
                      "Records.requestParameters.includeShadowTrails": false
                    }
                  }
                ],
                "minimum_should_match": 1
              }
            },
            "score_mode": "none"
          }
        },
        {
          "range": {
            "Records.eventTime": {
              "gte": "2018-05-25T04:04:41.010Z",
              "lte": "2019-08-13T03:54:11.594Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  }
}

Range

Use the range query to search for a range of values in a field. There are two forms, one with greater than/less than ranges, and one with from/to ranges.

The following form takes greater than gt[e] and less than lt[e] as parameters to find a range of customer keys from 10 to 15, inclusive.

{
  "query": {
    "range": {
      "o_custkey": {
        "gte": 10,
        "lte": 15
      }
    }
  }
}

The following form takes to and from as the value parameters, with include_upper and include_lower to specify inclusive or exclusive bounds.

{
  "query": {
    "range": {
      "timeField": {
        "from": "2020-Aug-21",
        "include_lower": true,
        "to": "2020-Sep-1"
      }
    }
  }
}

Exists

Use the exists query to search for records that contain the specified field.

{
  "query": {
    "exists": {
      "field": "fieldName"
    }
  }
}

QueryString

The query_string query returns documents based on a provided query string in Query DSL format.

{
  "query_string": {
    "query": "field1:A AND field2:B"
  }
}

or another example looking for fields with any of the following OR'ed terms in parentheses:
{
  "query": {
    "query_string": {
      "query": "(first string) OR (second set of terms)",
      "default_field": "field_name"
    }
  }
}

Regular Expression

The regexp query performs a regular expression (regex) search. The field_name is the name of the view column that you want to search, and regular_expression value follows the regular expression syntax.

{
  "query":{
    "regexp": {
      "field_name": {
        "value": "regular_expression"
        "case_insensitive": false
      }
    }
  }
}
    
For example:
{
  "query": {
    "regexp": {
      "Records.awsRegion": {
        "case_insensitive": false,
        "value": "ap-south.*"
      }
    }
  }
}

Fuzzy Search

The fuzzy query matches anything within the given Levenshtein edit distance of the input search query. (There is limited support for this syntax.)

{
  "query": {
    "fuzzy": {
      "fieldName": {
        "value": "string",
      }
    }
  }
}

As an example, the input string depose would match on values like depose and depos, deposit, deposits, and similar Levenshtein matches.

For example:
{
  "query": {
    "fuzzy": {
      "o_comment": {
        "value": "depose"
      }
    }
  }
}

Bucket Aggregations

Bucket aggregations create "buckets" of results, where each bucket is associated with a criterion based on the aggregation type. The bucket aggregations determine the buckets and return the number of results (or documents) in each bucket. The following sections list the supported bucket aggregations and some notes for behaviors with ChaosSearch.

A basic aggs structure follows. The bucket aggregation types would replace the aggregationName syntax for their specific definitions.

{
  "aggs": {
    "aggregationName": {
      "field": "fieldName",
      ...
    }
  }
}

Terms

A terms aggregation enables you to specify the top or bottom n elements of a given field to display, ordered by count or a custom metric up to a size limit. Note that the sum_other_doc_count option is not supported.

The query accepts a sort order. The default is a descending sort.

{
  "aggs": {
    "aggregationName": {
      "terms": {
        "field": "fieldName",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

Another sort example, this time ascending based on field values:

{
  "aggs": {
    "aggregationName": {
      "terms": {
        "field": "fieldName",
        "size": 20,
        "order": {
          "_term": "asc"
        }
...

An example of a terms aggregation query follows:

{
  "aggs": {
    "priority": {
      "terms": {
        "field": "o_orderpriority",
        "order": {
          "_count": "desc"
        },
        "size": 5
      }
    }
...

Date Histogram

A date_histogram is built from a numeric field and organized by date. You can specify a time frame for the intervals in seconds, minutes, hours, days, weeks, months, or years. You can also specify a custom interval frame. Custom interval time units are s for seconds, m for minutes, h for hours, d for days, w for weeks, and y for years. Different units support different levels of precision, down to one second.

A sample of the date_histogram structure follows:

{
  "aggs": {
    "aggregationName": {
      "date_histogram": {
        "field": "fieldName",
        "interval": "3w"
      }
    }
  }
}

Optionally, the query accepts bounds which will always be present (with empty/default values) even if no documents are present in them.

{
  "aggregationName": {
    "date_histogram": {
      "field": "fieldName",
      "interval": "3w",
      "extended_bounds": {
        "min": "2020",
        "max": "2021"
      }
    }
  }
}

Histogram

A standard histogram is built from a numeric field. Specify an integer interval for this field.

{
  "aggs": {
    "aggregationName": {
      "histogram": {
        "field": "fieldName",
        "interval": 100
      }
    }
  }
}

Range

The range bucket aggregation enables the user to define a set of ranges, where each range represents a separate bucket. During the aggregation process, the values extracted from each document (matching index record) are checked against each bucket range and the records are placed into the relevant/matching range. Each aggregation bucket includes records that equal or exceed the from value and does not include records that equal to or exceed the to value for each range.

{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"to": 20.0},
         {"from": 20.0, "to": 40.0},
         {"from": 40.0, "to": 60.0},
         {"from": 60.0, "to": 80.0},
         {"from": 80.0, "to": 100.0}
       ]
     }
   }
 }
}

As in this example, if there are no sub-aggregations defined, the results of the bucket aggregation show a count of the records/documents within each bucket:

{
  "value_ranges": {
    "buckets": {
      "*-20.0": {
        "doc_count": 4,
        "to": 20.0
      },
      "20.0-40.0": {
        "doc_count": 2,
        "from": 20.0,
        "to": 40.0
      },
      "40.0-60.0": {
        "doc_count": 1,
        "from": 40.0,
        "to": 60.0
      },
      "60.0-80.0": {
        "doc_count": 1,
        "from": 60.0,
        "to": 80.0
      },
      "80.0-100.0": {
        "doc_count": 1,
        "from": 80.0,
        "to": 100.0
      }
    }
  }
}

As with other bucket aggregations, you can perform sub-aggregations within the ranges. Sub-aggregations provide the ability to further refine and separate ranges by different criteria, so that you could apply metrics at various levels in the aggregation to create your report. A sample range aggregation with sub-aggregations follows:

{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"key": "b", "from": 20.0, "to": 60.0},
         {"key": "c", "from": 40.0, "to": 80.0}
       ]
     },
     "aggs": {
       "summation_of_values": {
         "sum": {"field": "value"}
       }
     }
   }
 }
}

A sample response with the sub-aggregations follows:

{
   "value_ranges": {
     "buckets": {
       "b": {
         "doc_count": 3,
         "from": 20.0,
         "summation_of_values": {
           "value": 110
         },
         "to": 60.0
       },
       "c": {
         "doc_count": 2,
         "from": 40.0,
         "summation_of_values": {
           "value": 113
         },
         "to": 80.0
       }
     }
   }
}

Ranges are typically discrete from each other, but they can overlap, and the counts will reflect the records that fall into each assigned range:

Query: 
{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"key": "b", "from": 20.0, "to": 60.0},
         {"key": "c", "from": 40.0, "to": 80.0}
       ]
     }
   }
 }

Response: 
{
  "value_ranges": {
    "buckets": {
      "b": {
        "doc_count": 3,
        "from": 20.0,
        "to": 60.0
      },
      "c": {
        "doc_count": 2,
        "from": 40.0,
        "to": 80.0
      }
    }
  }
}

Date Range

Use the date_range aggregation to specify ranges of values for a numeric field. The date_range aggregation is conceptually the same as the range aggregation, except that it lets you perform date math. For example, you can get all documents from the last 10 days. To make the date more readable, include the format with a format parameter.

IP Range

Buckets based on IPv4 Ranges (multiple ranges formatted as {from, to}.

{
  "aggs": {
    "aggregationName": {
      "ip_range": {
         "field": "fieldName",
         "ranges": [
           {
             "from": "0.0.0.0",
             "to": "127.255.255.255"
           }
         ]
       }
    }
  }
}

Metric Aggregations

Metric aggregations perform simple calculations such as finding the minimum, maximum, and average values of a field. There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.

  • Single-value metric aggregations return a single metric. For example, sum, min, max, avg, cardinality, and count.
  • Multi-value metric aggregations return more than one metric. For example, stats, extended_stats, matrix_stats, and others, and are not supported in this API.

The sum, min, max, count, and avg metrics are single-value metric aggregations that return the sum, minimum, maximum, count, and average values of a field, respectively.

Average

The avg aggregation returns the average value of a numeric field. Metric aggregations can be nested as in the following example. The

{
  "aggs": {
    "aggregationName": {
      "terms": {
        "field": "fieldName", 
        "size": 20
      },
      "metricAggregation": {
        "avg": "fieldName"
      }
    }
  }
}

Metric aggregations can be top-level, as in the following example:

{
  "aggs": {
    "metricAggregation": {
      "avg": {
        "field": "fieldName"
      }
    }
  }
}

Maximum

The max aggregation returns the maximum value of a numeric field.

{
  "aggs": {
    "metricAggregation": {
      "max": {
        "field": "fieldName"
      }
    }
  }
}

Minimum

The min aggregation returns the minimum value of a numeric field.

{
  "aggs": {
    "metricAggregation": {
      "min": {
        "field": "fieldName"
      }
    }
  }
}

Sum

The sum aggregation returns the total value of a numeric field.

{
  "aggs": {
    "metricAggregation": {
      "sum": {
        "field": "fieldName"
      }
    }
  }
}

Count

The count metric returns a raw count of the elements in the selected field.

{
  "aggs": {
    "metricAggregation": {
      "count": {
        "field": "fieldName"
      }
    }
  }
}

Cardinality

The cardinality metric is a single-value metric aggregation that provides an approximate count of the number of unique or distinct values of a field. The metric uses the HyperLogLog estimation algorithm.

{
  "aggs": {
    "metricAggregation": {
      "cardinality": {
        "field": "fieldName"
      }
    }
  }
}

Percentiles

The percentiles metric aggregation calculates one or more percentiles over the numeric values found in the aggregated indexed records of the view. These values can be extracted from specific numeric or histogram fields.

By default, the percentiles metric returns the values for a range of percentiles: [1, 5, 25, 50, 75, 95, 99]. You specify one or more percentile values to calculate within the range of 0 to 100, for example:

"percents":[50,60,70]

Percentiles are the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values. For example, the 95th percentile of a response_time metric indicates that 95% of the response_time values were below that number. Percentiles are often a tool for finding exceptional values (outliers) for analysis.

📘

Percentiles are approximate values.

It is important to note that percentile calculations in ChaosSearch, like Elasticsearch, are approximate values. The percentiles aggregation uses the TDigest algorithm (see Ted Dunning in Computing Accurate Quantiles using T-Digests) for a description and some noteworthy behaviors.

For small datasets, or for very higher-number percentiles (99%), the percentile value is often very close to or exactly the correct percentile value.

For very large data sets at scale, the value calculation is more approximate because it looks for the percentile value and balances memory requirements and performance for the query. The error rate is usually around ~2% for large datasets.

A sample percentiles query follows. Note that because the query does not specify one particular percentile value, so the response will show the values at the default set of percentiles:

{
 "aggs": {
   "1": {
     "percentiles": {
       "field": "first"
     }
   }
 }
}

A sample response follows:

{
 "1": {
   "values": {
      "1.0": 2.0,
      "5.0": 6.0,
      "25.0": 25.5,
      "50.0": 50.5,
      "75.0": 75.5,
      "95.0": 96.0,
      "99.0": 100.0
   }
 }
}

If you query for a specific set of percentages:

{
 "aggs": {
   "1": {
     "percentiles": {
       "field": "first",
       "percents": [50, 60, 70]
     }
   }
 }
}

A sample response follows:

{
 "1": {
   "values": {
       "50.0": 50.5,
       "60.0": 60.5,
       "70.0": 70.5
   }
 }
}