Elasticsearch API Support

ChaosSearch includes support for the Elasticsearch API.

This topic provides an overview of the Elasticsearch API support as another interface in addition to the Kibana UI. APIs include query support for search and some metric and bucket aggregations.

Search Query Clauses

The multi-search operation using the _msearch (v7.10) lets you bundle multiple search requests into a single request. The searches run in parallel, so you can receive a response more quickly compared to sending one request per search. Each search runs independently, so the failure of one does not affect the others.

The basic match_all query follows, which returns all results in the Refinery view:

{
  "query": {
    "match_all": {
    }
  }
}

Return No Hits

You can add a size of 0 to the match_all query, which can be useful for aggregation queries.

{
  "size": 0
}

Bool

The bool query combines multiple search queries with boolean logic. You can use boolean logic between queries to either narrow or broaden your search results. Bool supports must, must_not, should, and filter clauses.

{
  "bool": {
    "must": [{
      "match_all": {
      }
    }],
    "must_not": [],
    "should": {
      "match_all": {
      }
    },
    "filter": []
  }
}

Fields can be provided as a single element or an array.

MatchPhrase

Use match_phrase with a search string to perform a full text search against the given field.

{ 
  "match_phrase": {
    "field": "fieldName",
    "query": "search string"
  }
}

match_phrase accepts number values.

{
  "match_phrase": {
    "field": "fieldName",
    "query": 123.456
  }
}

match_phrase accepts null values.

{
  "match_phrase": {
    "field": "fieldName",
    "query": null
  }
}

match_phrase and other basic query clauses also accept a nested form.

{
  "match_phrase": {
    "fieldName": {
    "query": "search string"
    }
  }
}

Match

The match query performs a match on the field and query search string:

{
  "match": {
    "field": "fieldName",
    "query": "search string"
  }
}

match can filter on IP ranges by CIDR form (for example 10.0.0.0/12).

{
  "match": {
    "field": "fieldName",
    "query": “CIDR-Range-Format”
  }
}

MultiMatch

multi-match performs a match across many fields:

{
  "multi_match": {
    "fields": ["field1", "field2"],
    "query": "search string"
  }
}

multi-match accepts just one field.

{
  "multi_match": {
    "fields": "field1",
    "query": "search string"
  }
}

Term

Use the term query to search for an exact term in a field.

{
  "term": {
    "field": "fieldName",
    "query": "keyword"
  }
}

Range

Use the range query to search for a range of values in a field. There are two forms, one with greater than/less than ranges, and one with from/to ranges.

The following form takes greater than gt[e] and less than lt[e] as parameters.

{
  "range": {
    "timeField": {
      "gte": "2020-Aug-21",
      "lt": "2020-Sep-1" 
    }
  }
}

The following form takes to and from as the value parameters, with include_upper and include_lower to specify inclusive or exclusive bounds.

{
  "range": {
    "timeField": {
      "from": "2020-Aug-21",
      "include_lower": true,
      "to": "2020-Sep-1"
    }
  }
}

Exists

Use the exists query to search for existence of a field in each row.

{
  "exists": {
    "field": "fieldName"
  }
}

QueryString

The query_string query returns documents based on a provided query string, using a parser with a strict syntax. Accepts the Lucene query syntax.

{
  "query_string": {
    "query": "field1:A or field2:B"
  }
}

Regular Expression

The regexp query performs a regular expression (regex) search.

{
  "regexp": {
    "field": "fieldName",
    "query": "[a-z]+"
  }
}

Fuzzy Search

The fuzzy query matches anything within the given Levenshtein edit distance.

{
  "fuzzy": {
    "field": "fieldName",
    "query": "typo",
    "fuzziness": 1
  }
}

Geo Bounding Box

The geo_bounding_box query filters results based on a point location using a bounding box.

{
  "geo_bounding_box": {
     "DestLocation": {
       "top_left": {
         "lat": Latitude,
         "lon": Longitude
       },
       "bottom_right": {
         "lat": Latitude,
         "lon": Longitude
      }
    }
  }
}

Geo Polygon

The geo_polygon query matches on results that fall within a geographic polygon of points (that is, a geographic region represented as a polygon with the specification of all longitude and latitude vertex points).

{
  "geo_polygon": {
  "ignore_unmapped": true,
  "DestLocation": {
    "points": [
      {
        "lat": Latitude,
        "lon": Longitude
      },
      {
        "lat": Latitude,
        "lon": Longitude
      }, ...
    ]
  }
 }
}

Bucket Aggregations

With these APIs, users can search for specified fields in the view.

Terms

A terms aggregation enables you to specify the top or bottom n elements of a given field to display, ordered by count or a custom metric up to a size limit.

{
  "aggregationName": {
    "terms": {
      "field": "fieldName",
      "size": 20
    }
  }
}

The query accepts a sort order. The default is a descending sort.

{
  "aggregationName": {
    "terms": {
      "field": "fieldName",
      "size": 20,
      "order": {
        "_count": "desc"
      }
    }
  }
}

Another sort example, this time ascending based on field values

{
  "aggregationName": {
    "terms": {
      "field": "fieldName",
      "size": 20,
      "order": {
        "_term": "asc"
      }
    }
  }
}

Date Histogram

A date_histogram is built from a numeric field and organized by date. You can specify a time frame for the intervals in seconds, minutes, hours, days, weeks, months, or years. You can also specify a custom interval frame. Custom interval time units are s for seconds, m for minutes, h for hours, d for days, w for weeks, and y for years. Different units support different levels of precision, down to one second.

{
  "aggregationName": {
    "date_histogram": {
      "field": "fieldName",
      "interval": "3w"
    }
  }
}

Optionally accepts bounds which will always be present (with empty / default values) even if no documents are present in them.

{
  "aggregationName": {
    "date_histogram": {
      "field": "fieldName",
      "interval": "3w",
      "extended_bounds": {
        "min": "2020",
        "max": "2021"
      }
    }
  }
}

Histogram

A standard histogram is built from a numeric field. Specify an integer interval for this field.

{
  "aggregationName": {
    "histogram": {
      "field": "fieldName",
      "interval": 100
    }
  }
}

Range

The range bucket aggregation enables the user to define a set of ranges, where each range represents a separate bucket. During the aggregation process, the values extracted from each document (matching index record) are checked against each bucket range and the records are placed into the relevant/matching range. Each aggregation bucket includes records that equal or exceed the from value and does not include records that equal to or exceed the to value for each range.

{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"to": 20.0},
         {"from": 20.0, "to": 40.0},
         {"from": 40.0, "to": 60.0},
         {"from": 60.0, "to": 80.0},
         {"from": 80.0, "to": 100.0}
       ]
     }
   }
 }
}

As in this example, if there are no sub-aggregations defined, the results of the bucket aggregation show a count of the records/documents within each bucket:

{
  "value_ranges": {
    "buckets": {
      "*-20.0": {
        "doc_count": 4,
        "to": 20.0
      },
      "20.0-40.0": {
        "doc_count": 2,
        "from": 20.0,
        "to": 40.0
      },
      "40.0-60.0": {
        "doc_count": 1,
        "from": 40.0,
        "to": 60.0
      },
      "60.0-80.0": {
        "doc_count": 1,
        "from": 60.0,
        "to": 80.0
      },
      "80.0-100.0": {
        "doc_count": 1,
        "from": 80.0,
        "to": 100.0
      }
    }
  }
}

As with other bucket aggregations, you can perform sub-aggregations within the ranges. Sub-aggregations provide the ability to further refine and separate ranges by different criteria, so that you could apply metrics at various levels in the aggregation to create your report. A sample range aggregation with sub-aggregations follows:

{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"key": "b", "from": 20.0, "to": 60.0},
         {"key": "c", "from": 40.0, "to": 80.0}
       ]
     },
     "aggs": {
       "summation_of_values": {
         "sum": {"field": "value"}
       }
     }
   }
 }
}

A sample response with the sub-aggregations follows:

{
   "value_ranges": {
     "buckets": {
       "b": {
         "doc_count": 3,
         "from": 20.0,
         "summation_of_values": {
           "value": 110
         },
         "to": 60.0
       },
       "c": {
         "doc_count": 2,
         "from": 40.0,
         "summation_of_values": {
           "value": 113
         },
         "to": 80.0
       }
     }
   }
}

Ranges are typically discrete from each other, but they can overlap, and the counts will reflect the records that fall into each assigned range:

Query: 
{
 "size": 0,
 "aggs":{
   "value_ranges": {
     "range": {
       "field": "value",
       "keyed": true,
       "ranges": [
         {"key": "b", "from": 20.0, "to": 60.0},
         {"key": "c", "from": 40.0, "to": 80.0}
       ]
     }
   }
 }

Response: 
{
  "value_ranges": {
    "buckets": {
      "b": {
        "doc_count": 3,
        "from": 20.0,
        "to": 60.0
      },
      "c": {
        "doc_count": 2,
        "from": 40.0,
        "to": 80.0
      }
    }
  }
}

Date Range

Use the date_range aggregation to specify ranges of values for a numeric field. The date_range aggregation is conceptually the same as the range aggregation, except that it lets you perform date math. For example, you can get all documents from the last 10 days. To make the date more readable, include the format with a format parameter.

IP Range

Buckets based on IPv4 Ranges (multiple ranges formatted as {from, to} or {mask} using CIDR notation
(for example 46.0.0.0/2).

{
  "aggregationName": {
    "ip_range": {
       "field": "fieldName",
       "ranges": [
         {
           "from": "0.0.0.0",
           "to": "127.255.255.255"
         },
         {
           "mask": "CIDR-Range-Format"}
       ]
     }
  }
}

GeoHash

The geohash_grid aggregation displays points based on geohash coordinates.

{
  "aggregationName": {
    "geohash_grid": {
      "field": "fieldName",
      "precision": [1-12]
    }
  }
}

Metric Aggregations

Metric aggregations perform simple calculations such as finding the minimum, maximum, and average values of a field. There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.

  • Single-value metric aggregations return a single metric. For example, sum, min, max, avg, cardinality, and count.
  • Multi-value metric aggregations return more than one metric. For example, stats, extended_stats, matrix_stats, and others, and are not supported in this API.

The sum, min, max, count, and avg metrics are single-value metric aggregations that return the sum, minimum, maximum, count, and average values of a field, respectively.

Average

The avg aggregation returns the average value of a numeric field. Metric aggregations can be nested as in the following example:

{
  "aggregationName": {
    "terms": {
      "field": "fieldName",
      "size": 20
    },
    "metricAggregation": {
      "avg": "fieldName"
    }
  }
}

Metric aggregations can be top-level, as in the following example:

{
  "metricAggregation": {
    "avg": {
      "field": "fieldName"
    }
  }
}

Maximum

The max aggregation returns the maximum value of a numeric field.

{
  "metricAggregation": {
    "max": {
      "field": "fieldName"
    }
  }
}

Minimum

The min aggregation returns the minimum value of a numeric field.

{
  "metricAggregation": {
    "min": {
      "field": "fieldName"
    }
  }
}

Sum

The sum aggregation returns the total value of a numeric field.

{
  "metricAggregation": {
    "sum": {
      "field": "fieldName"
    }
  }
}

Count

The count metric returns a raw count of the elements in the selected field.

{
  "metricAggregation": {
    "count": {
      "field": "fieldName"
    }
  }
}

Cardinality

The cardinality metric is a single-value metric aggregation that provides an approximate count of the number of unique or distinct values of a field. The metric uses the HyperLogLog estimation algorithm.

{
  "metricAggregation": {
    "cardinality": {
      "field": "fieldName"
    }
  }
}

Percentiles

The percentiles metric aggregation calculates one or more percentiles over the numeric values found in the aggregated indexed records of the view. These values can be extracted from specific numeric or histogram fields.

By default, the percentiles metric returns the values for a range of percentiles: [1, 5, 25, 50, 75, 95, 99]. You specify one or more percentile values to calculate within the range of 0 to 100, for example:

"percents":[50,60,70]

Percentiles are the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values. For example, the 95th percentile of a response_time metric indicates that 95% of the response_time values were below that number. Percentiles are often a tool for finding exceptional values (outliers) for analysis.

📘

Percentiles are approximate values.

It is important to note that percentile calculations in ChaosSearch, like Elasticsearch, are approximate values. The percentiles aggregation uses the TDigest algorithm (see Ted Dunning in Computing Accurate Quantiles using T-Digests) for a description and some noteworthy behaviors.

For small datasets, or for very higher-number percentiles (99%), the percentile value is often very close to or exactly the correct percentile value.

For very large data sets at scale, the value calculation is more approximate because it looks for the percentile value and balances memory requirements and performance for the query. The error rate is usually around ~2% for large datasets.

A sample percentiles query follows. Note that because the query does not specify one particular percentile value, so the response will show the values at the default set of percentiles:

{
 "aggs": {
   "1": {
     "percentiles": {
       "field": "first"
     }
   }
 }
}

A sample response follows:

{
 "1": {
   "values": {
      "1.0": 2.0,
      "5.0": 6.0,
      "25.0": 25.5,
      "50.0": 50.5,
      "75.0": 75.5,
      "95.0": 96.0,
      "99.0": 100.0
   }
 }
}

If you query for a specific set of percentages:

{
 "aggs": {
   "1": {
     "percentiles": {
       "field": "first",
       "percents": [50, 60, 70]
     }
   }
 }
}

A sample response follows:

{
 "1": {
   "values": {
       "50.0": 50.5,
       "60.0": 60.5,
       "70.0": 70.5
   }
 }
}

Did this page help you?