Bulk Query Export

Run an Elasticsearch/Discover search in the background and export an unlimited, complete result set to S3 for later review.

When users run a Search Analytics > Discover, the result set has a configurable limit of 500 records to manage the browser memory needed to display and navigate them. If you share the result set by exporting to PDF or CSV, only those 500 records are exported.

Using the optional bulk query export feature, you can export the complete set of a Discover/Elasticsearch results set to an AWS S3 bucket location and bypass the cap imposed by the browser configuration limit. Bulk export uses a batch job to incrementally collect and write the entire result set to one or more files stored in a designated cloud storage/S3 bucket.

There are two ways to run a bulk query export from the ChaosSearch console:

Bulk exports on data at scale could take a long time to complete. While an export is in progress, you can review the status using Bulk Export UI. After the query and export has completed, you can review and download results files from your S3 bucket.

👍

Bulk Export is an optional feature that must be enabled for use.

Contact ChaosSearch Customer Success for more information about enabling the bulk export feature and considerations for using it on a regular basis to export large result sets.

The following topics describe how to use the Bulk Export UI for exports, status, and the RBAC permissions needed for bulk export.

📘

Make sure to plan for the proper configuration of bucket access policies on the customer/user account.

The ChaosSearch role must be permitted to write into the designated customer-provided S3 bucket export location. See the RBAC Configuration section for sample policies or contact ChaosSearch for assistance with the permissions.

Run a Bulk Query Export from the UI

To run an export from the Bulk ExportUI:

  1. In the console, click the user icon in the top-right corner to display the pop-up menu, then click Bulk Export.
  1. The Bulk Export page appears.
  1. Click Create Export to open the Bulk Export window.
  1. In the query pane, paste or type the Elasticsearch query that you want to run. Search query clauses, metric aggregations (except for percentiles aggregations), and bucket aggregations are supported. A simple example follows:
{
  "index": "sample-elb-view",
  "body": {
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "elb_status_code": 404
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "2003-08-10T00:00:00",
                "lte": "2023-08-10T23:59:59"
              }
            }
          }
        ]
      }
    }
  }
}
  1. Specify a name for the export task. The default name is a concatenation of the user account and timestamp for the export.
  2. Select a format for the exported file(s). You can choose JSON (default) or CSV.
  3. Specify a maximum file size for the exported files. The default is 100 MB. Bulk export runs a distributed set of parallel queries for performance; the exported files could vary in size but will not exceed the maximum size.
  4. Select a compression method for the exported files. You can choose GZIP (default) or NONE.
  5. Specify the cloud storage bucket destination for the exported files. For example, my-export-bucket/cloudfront-data.
  6. Click Export.
    The export process runs in the background to gather the results and export them to the target storage location. The export name is added to the left menu area. Select the export and the Export Details pane to watch its status.
    NOTE: If the export process runs and returns to the export definition window, check the syntax of your search query for possible errors.

🚧

Use caution if your query result set could have many millions or billions of rows.

In this early access version, the bulk query export has been tested with search result sets that have millions and tens of millions of rows. As a good practice, keep the result set row counts below those levels.

Large exports could take many minutes—possibly hours—to complete depending on the scope of the query. Use the Results pane to navigate to the storage buckets and see a list of exported files after they are available.

Important Notes

As you use bulk query export, be sure to plan for and understand the following operational behaviors.

  • Export queries should have as narrow a time range as possible. For example, avoid using very wide time ranges such as many weeks/months/years when the desired results range could be narrower. In some cases, a bulk export over a wide time span could require more worker resources than configured. Bulk export could display a Capacity reached limit message, and will continue retrying the export operations that might have failed due to limited resources.
  • For queries with a greater than 3-hour time range, the query used for export is "sliced" into multiple queries that run in parallel to process the export and results. In the Export Details pane, you could see the query ID changing as the export runs, and a higher than expected number of segments being scanned when compared to the query run by a Discover export.

📘

Percentiles Aggregations are not yet supported.

Bulk export does not yet support percentiles aggregations in the export query.

Bulk Query Export and Workers

The bulk query export uses the same pool of query workers as other ChaosSearch queries. For an occasional bulk query export, there is a small worker impact for the duration of the export. If you plan to use bulk query export on a regular basis, contact your ChaosSearch Customer Success representative to inquire about dedicated workers for the export workloads.

RBAC Configuration for Bulk Query Export

To access the Bulk Export console page, a user's group RBAC permissions must include the ui:export Action.

To submit, list, cancel, and obtain status of bulk export jobs in the Bulk Export page, the following group permissions are also required:

  • chaos:query:export:submit
  • chaos:query:export:status
  • chaos:query:export:cancel
  • chaos:query:export:list

To submit a bulk export query from the Share option of Discover, the user must also have the following permissions:

  • Permission to query that view
  • elastic:* for that view

Sample "Bulk Export Admin" Group

The following permissions grant users access to submit export jobs to views they have access to and view/manage all export jobs. (Other common actions and permissions are omitted for the examples below.)

[
  {
    "Actions":
    [
      "chaos:query:export:submit",
      "chaos:query:export:status",
      "chaos:query:export:cancel",
      "chaos:query:export:list",
      "ui:export"
    ],
    "Effect": "Allow",
    "Resources":
    [
      "*"
    ],
    "Version": "1.0"
  }
]

Sample "Bulk Export User" Group

The following permissions grant users access to submit export jobs for views that they have access to, into any path under s3://my-target-bucket-uuid/export/*.

[
  {
    "Actions":
    [
      "chaos:query:export:submit"
    ],
    "Effect": "Allow",
    "Resources":
    [
      "arn:aws:s3:::my-target-bucket-uuid/export/*"
    ],
    "Version": "1.0"
  }
]

Sample Bulk Export Status Group

The following group grants users access to view/manage export jobs submitted by test@mycompany.com.

[
  {
    "Actions":
    [
      "chaos:query:export:status",
      "chaos:query:export:cancel",
      "chaos:query:export:list",
      "ui:export"
    ],
    "Condition":
    {
      "Conditions":
      [
        {
          "Equals":
          {
            "chaos:owner/crn": "test@mycompany.com"
          }
        }
      ]
    },
    "Effect": "Allow",
    "Resources":
    [
      "*"
    ],
    "Version": "1.0"
  }
]