Elasticsearch API Support
ChaosSearch includes support for the Elasticsearch API.
This topic provides an overview of the Elasticsearch API support as another interface in addition to the OpenSearch Dashboards UI. APIs include query support for search and some metric and bucket aggregations.
Important Search and Filter Notes
In ChaosSearch, search terms and phrases are almost always case-sensitive, unless the Refinery view is configured as a case-insensitive view. (Case-insensitive views are not recommended for performance reasons.)
ChaosSearch also leverages some Elasticsearch API options as default settings for performance (for example, wildcard brackets are required to find the search phrase anywhere in a target field in match_phrase queries). Otherwise, the search matches only the records with that exact search phrase or term as the field value.
The multi-search operation using _msearch
(v7.10) lets you bundle multiple search requests into a single request. The searches run in parallel, so you can receive a response more quickly compared to sending one request per search. Each search runs independently, so the failure of one does not affect the others.
Search Query Clauses
The following sections provide an overview of the supported Elasticsearch Query DSL that can be used with ChaosSearch. The supported syntax is a subset of the available API, and these sections offer some notes and considerations for use. (More information about the supported syntax is also available in the Elasticsearch API documentation, be sure to avoid any syntax not listed below.) The query clauses can also be passed into the Search Analytics > Discover UI with the Add filter dialog.
Match All Query
The basic match_all
query follows, which returns all results in the Refinery view:
{
"query": {
"match_all": {
}
}
}
Bool
The bool
query combines multiple search queries with boolean logic. You can use boolean logic between queries to either narrow or broaden your search results. Bool supports must
, must_not
, should
, and filter
clauses.
{
"query": {
"bool" : {
"must" : {
"term" : { "user.id" : "user" }
},
"filter": {
"term" : { "tags" : "tagval" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tags" : "env1" } },
{ "term" : { "tags" : "deployed" } }
]
}
}
}
Fields can be provided as a single element or an array. An example follows:
{
"query": {
"bool": {
"must_not": {
"term": {
"Records.eventName": "DescribeVpcs"
}
}
}
}
}
MatchPhrase
Use match_phrase
with a search string to perform a full text search against the given fieldName. Note that in Elastic environments, a match_phrase
query defaults to match records that have the specified phrase anywhere inside the search field. With ChaosSearch, you must explicitly bracket search phrases or terms with wildcards/asterisks to search for that term somewhere in a field. Otherwise, the search matches only the records with the exact search phrase or term in the field.
If you specify only a starting or ending asterisk, the matching records will either begin or end (respectively) with the specified string value in the target field.
{
"query": {
"match_phrase": {
"fieldName": "*search phrase*"
}
}
}
match_phrase
accepts number values and IP addresses. For example:
To return records where o_orderkey is exactly 1124:
{
"query": {
"match_phrase": {
"o_orderkey": "1124"
}
}
}
or to return records where the o_orderkey field is a value that begins with 112:
{
"query": {
"match_phrase": {
"o_orderkey": "112*"
}
}
}
or to return records where an ipaddress field has a specific address:
{
"query": {
"match_phrase": {
"ipaddress": "10.85.77.211"
}
}
}
match_phrase
can be used to return records with fields that are null.
{
"query": {
"match_phrase": {
"Records.requestParameters": null
}
}
}
Match
In ChaosSearch, the match
query behaves similarly to the match_phrase
query behavior. See the notes for match_phrase
.
{
"query": {
"match": {
"Records.requestParameters": "*template*"
}
}
}
MultiMatch
The multi_match
query performs a match across one or more fields specified in a comma-separated array. For the query string, see the match_phrase
notes on wildcard usage and implications with ChaosSearch.
{
"query": {
"multi_match": {
"fields": [ "field1", "field2" ],
"query": "*string*"
}
}
}
For example:
{
"query": {
"multi_match": {
"fields": [
"Records.eventName",
"Records.requestParameters"
],
"query": "*filterSet*"
}
}
}
Nested
The nested
query wraps another query to search nested fields. The nested query searches nested field objects as if they were indexed as separate documents. If an object matches the search, the nested query returns the root parent document.
{
"query": {
"nested": {
"path": "obj1",
"query": {
"bool": {
"must": [
{ "match": { "obj1.name": "blue" } },
{ "range": { "obj1.count": { "gt": 5 } } }
]
}
},
"score_mode": "avg"
}
}
}
For example:
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "Records.requestParameters",
"query": {
"bool": {
"should": [
{
"match": {
"Records.requestParameters.includeShadowTrails": false
}
}
],
"minimum_should_match": 1
}
},
"score_mode": "none"
}
},
{
"range": {
"Records.eventTime": {
"gte": "2018-05-25T04:04:41.010Z",
"lte": "2019-08-13T03:54:11.594Z",
"format": "strict_date_optional_time"
}
}
}
]
}
}
}
Range
Use the range
query to search for a range of values in a field. There are two forms, one with greater than/less than ranges, and one with from/to ranges.
The following form takes greater than gt[e]
and less than lt[e]
as parameters to find a range of customer keys from 10 to 15, inclusive.
{
"query": {
"range": {
"o_custkey": {
"gte": 10,
"lte": 15
}
}
}
}
The following form takes to
and from
as the value parameters, with include_upper
and include_lower
to specify inclusive or exclusive bounds.
{
"query": {
"range": {
"timeField": {
"from": "2020-Aug-21",
"include_lower": true,
"to": "2020-Sep-1"
}
}
}
}
Exists
Use the exists
query to search for records that contain the specified field.
{
"query": {
"exists": {
"field": "fieldName"
}
}
}
QueryString
The query_string
query returns documents based on a provided query string in Query DSL format.
{
"query_string": {
"query": "field1:A AND field2:B"
}
}
or another example looking for fields with any of the following OR'ed terms in parentheses:
{
"query": {
"query_string": {
"query": "(first string) OR (second set of terms)",
"default_field": "field_name"
}
}
}
Regular Expression
The regexp
query performs a regular expression (regex) search. The field_name
is the name of the view column that you want to search, and regular_expression
value follows the regular expression syntax.
{
"query":{
"regexp": {
"field_name": {
"value": "regular_expression"
"case_insensitive": false
}
}
}
}
For example:
{
"query": {
"regexp": {
"Records.awsRegion": {
"case_insensitive": false,
"value": "ap-south.*"
}
}
}
}
Fuzzy Search
The fuzzy
query matches anything within the given Levenshtein edit distance of the input search query. (There is limited support for this syntax.)
{
"query": {
"fuzzy": {
"fieldName": {
"value": "string",
}
}
}
}
As an example, the input string depose
would match on values like depose and depos, deposit, deposits, and similar Levenshtein matches.
For example:
{
"query": {
"fuzzy": {
"o_comment": {
"value": "depose"
}
}
}
}
Bucket Aggregations
Bucket aggregations create "buckets" of results, where each bucket is associated with a criterion based on the aggregation type. The bucket aggregations determine the buckets and return the number of results (or documents) in each bucket. The following sections list the supported bucket aggregations and some notes for behaviors with ChaosSearch.
A basic aggs structure follows. The bucket aggregation types would replace the aggregationName syntax for their specific definitions.
{
"aggs": {
"aggregationName": {
"field": "fieldName",
...
}
}
}
Terms
A terms
aggregation enables you to specify the top or bottom n elements of a given field to display, ordered by count or a custom metric up to a size limit. Note that the sum_other_doc_count
option is not supported.
The query accepts a sort order. The default is a descending sort.
{
"aggs": {
"aggregationName": {
"terms": {
"field": "fieldName",
"size": 5,
"order": {
"_count": "desc"
}
}
}
}
}
Another sort example, this time ascending based on field values:
{
"aggs": {
"aggregationName": {
"terms": {
"field": "fieldName",
"size": 20,
"order": {
"_term": "asc"
}
...
An example of a terms aggregation query follows:
{
"aggs": {
"priority": {
"terms": {
"field": "o_orderpriority",
"order": {
"_count": "desc"
},
"size": 5
}
}
...
Date Histogram
A date_histogram
is built from a numeric field and organized by date. You can specify a time frame for the intervals in seconds, minutes, hours, days, weeks, months, or years. You can also specify a custom interval frame. Custom interval time units are s
for seconds, m
for minutes, h
for hours, d
for days, w
for weeks, and y
for years. Different units support different levels of precision, down to one second.
A sample of the date_histogram
structure follows:
{
"aggs": {
"aggregationName": {
"date_histogram": {
"field": "fieldName",
"interval": "3w"
}
}
}
}
Optionally, the query accepts bounds which will always be present (with empty/default values) even if no documents are present in them.
{
"aggregationName": {
"date_histogram": {
"field": "fieldName",
"interval": "3w",
"extended_bounds": {
"min": "2020",
"max": "2021"
}
}
}
}
Histogram
A standard histogram
is built from a numeric field. Specify an integer interval for this field.
{
"aggs": {
"aggregationName": {
"histogram": {
"field": "fieldName",
"interval": 100
}
}
}
}
Range
The range
bucket aggregation enables the user to define a set of ranges, where each range represents a separate bucket. During the aggregation process, the values extracted from each document (matching index record) are checked against each bucket range and the records are placed into the relevant/matching range. Each aggregation bucket includes records that equal or exceed the from
value and does not include records that equal to or exceed the to
value for each range.
{
"size": 0,
"aggs":{
"value_ranges": {
"range": {
"field": "value",
"keyed": true,
"ranges": [
{"to": 20.0},
{"from": 20.0, "to": 40.0},
{"from": 40.0, "to": 60.0},
{"from": 60.0, "to": 80.0},
{"from": 80.0, "to": 100.0}
]
}
}
}
}
As in this example, if there are no sub-aggregations defined, the results of the bucket aggregation show a count of the records/documents within each bucket:
{
"value_ranges": {
"buckets": {
"*-20.0": {
"doc_count": 4,
"to": 20.0
},
"20.0-40.0": {
"doc_count": 2,
"from": 20.0,
"to": 40.0
},
"40.0-60.0": {
"doc_count": 1,
"from": 40.0,
"to": 60.0
},
"60.0-80.0": {
"doc_count": 1,
"from": 60.0,
"to": 80.0
},
"80.0-100.0": {
"doc_count": 1,
"from": 80.0,
"to": 100.0
}
}
}
}
As with other bucket aggregations, you can perform sub-aggregations within the ranges. Sub-aggregations provide the ability to further refine and separate ranges by different criteria, so that you could apply metrics at various levels in the aggregation to create your report. A sample range
aggregation with sub-aggregations follows:
{
"size": 0,
"aggs":{
"value_ranges": {
"range": {
"field": "value",
"keyed": true,
"ranges": [
{"key": "b", "from": 20.0, "to": 60.0},
{"key": "c", "from": 40.0, "to": 80.0}
]
},
"aggs": {
"summation_of_values": {
"sum": {"field": "value"}
}
}
}
}
}
A sample response with the sub-aggregations follows:
{
"value_ranges": {
"buckets": {
"b": {
"doc_count": 3,
"from": 20.0,
"summation_of_values": {
"value": 110
},
"to": 60.0
},
"c": {
"doc_count": 2,
"from": 40.0,
"summation_of_values": {
"value": 113
},
"to": 80.0
}
}
}
}
Ranges are typically discrete from each other, but they can overlap, and the counts will reflect the records that fall into each assigned range:
Query:
{
"size": 0,
"aggs":{
"value_ranges": {
"range": {
"field": "value",
"keyed": true,
"ranges": [
{"key": "b", "from": 20.0, "to": 60.0},
{"key": "c", "from": 40.0, "to": 80.0}
]
}
}
}
Response:
{
"value_ranges": {
"buckets": {
"b": {
"doc_count": 3,
"from": 20.0,
"to": 60.0
},
"c": {
"doc_count": 2,
"from": 40.0,
"to": 80.0
}
}
}
}
Date Range
Use the date_range
aggregation to specify ranges of values for a numeric field. The date_range
aggregation is conceptually the same as the range
aggregation, except that it lets you perform date math. For example, you can get all documents from the last 10 days. To make the date more readable, include the format with a format parameter.
IP Range
Buckets based on IPv4 Ranges (multiple ranges formatted as {from, to}
.
{
"aggs": {
"aggregationName": {
"ip_range": {
"field": "fieldName",
"ranges": [
{
"from": "0.0.0.0",
"to": "127.255.255.255"
}
]
}
}
}
}
Metric Aggregations
Metric aggregations perform simple calculations such as finding the minimum, maximum, and average values of a field. There are two types of metric aggregations: single-value metric aggregations and multi-value metric aggregations.
- Single-value metric aggregations return a single metric. For example,
sum
,min
,max
,avg
,cardinality
, andcount
. - Multi-value metric aggregations return more than one metric. For example,
stats
,extended_stats
,matrix_stats
, and others, and are not supported in this API.
The sum
, min
, max
, count
, and avg
metrics are single-value metric aggregations that return the sum, minimum, maximum, count, and average values of a field, respectively.
Average
The avg
aggregation returns the average value of a numeric field. Metric aggregations can be nested as in the following example. The
{
"aggs": {
"aggregationName": {
"terms": {
"field": "fieldName",
"size": 20
},
"metricAggregation": {
"avg": "fieldName"
}
}
}
}
Metric aggregations can be top-level, as in the following example:
{
"aggs": {
"metricAggregation": {
"avg": {
"field": "fieldName"
}
}
}
}
Maximum
The max
aggregation returns the maximum value of a numeric field.
{
"aggs": {
"metricAggregation": {
"max": {
"field": "fieldName"
}
}
}
}
Minimum
The min
aggregation returns the minimum value of a numeric field.
{
"aggs": {
"metricAggregation": {
"min": {
"field": "fieldName"
}
}
}
}
Sum
The sum
aggregation returns the total value of a numeric field.
{
"aggs": {
"metricAggregation": {
"sum": {
"field": "fieldName"
}
}
}
}
Count
The count
metric returns a raw count of the elements in the selected field.
{
"aggs": {
"metricAggregation": {
"count": {
"field": "fieldName"
}
}
}
}
Cardinality
The cardinality
metric is a single-value metric aggregation that provides an approximate count of the number of unique or distinct values of a field. The metric uses the HyperLogLog estimation algorithm.
{
"aggs": {
"metricAggregation": {
"cardinality": {
"field": "fieldName"
}
}
}
}
Percentiles
The percentiles
metric aggregation calculates one or more percentiles over the numeric values found in the aggregated indexed records of the view. These values can be extracted from specific numeric or histogram fields.
By default, the percentiles
metric returns the values for a range of percentiles: [1, 5, 25, 50, 75, 95, 99]. You specify one or more percentile values to calculate within the range of 0 to 100, for example:
"percents":[50,60,70]
Percentiles are the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values. For example, the 95th percentile of a response_time metric indicates that 95% of the response_time values were below that number. Percentiles are often a tool for finding exceptional values (outliers) for analysis.
Percentiles are approximate values.
It is important to note that percentile calculations in ChaosSearch, like Elasticsearch, are approximate values. The
percentiles
aggregation uses the TDigest algorithm (see Ted Dunning in Computing Accurate Quantiles using T-Digests) for a description and some noteworthy behaviors.For small datasets, or for very higher-number percentiles (99%), the percentile value is often very close to or exactly the correct percentile value.
For very large data sets at scale, the value calculation is more approximate because it looks for the percentile value and balances memory requirements and performance for the query. The error rate is usually around ~2% for large datasets.
A sample percentiles
query follows. Note that because the query does not specify one particular percentile value, so the response will show the values at the default set of percentiles:
{
"aggs": {
"1": {
"percentiles": {
"field": "first"
}
}
}
}
A sample response follows:
{
"1": {
"values": {
"1.0": 2.0,
"5.0": 6.0,
"25.0": 25.5,
"50.0": 50.5,
"75.0": 75.5,
"95.0": 96.0,
"99.0": 100.0
}
}
}
If you query for a specific set of percentages:
{
"aggs": {
"1": {
"percentiles": {
"field": "first",
"percents": [50, 60, 70]
}
}
}
}
A sample response follows:
{
"1": {
"values": {
"50.0": 50.5,
"60.0": 60.5,
"70.0": 70.5
}
}
}
Updated 5 months ago