Bulk Query Export

Run an Elasticsearch/Discover search in the background and export an unlimited, complete result set to S3 for later review.

You can use bulk query export to submit batch jobs that run an Search Analytics/Elasticsearch Discover query and incrementally write the complete result set to a series of files in a designated S3 bucket.

When you run Search Analytics > Discover from the UI, the Discover results show the number of matching results (hits), but the Discover result set in the browser (and which can be exported to PDF or CSV) is limited in the user interface to a default of 500 rows to manage the memory needed to process the result set.

The bulk export captures the complete result set, without imposing the browser row ceiling. You can review the progress of the export in the Bulk Export UI, and after the query and export has completed, you can review and download the results files from your S3 bucket.

πŸ“˜

Make sure to plan for the proper configuration of access policies on the customer/user account.

The ChaosSearch role must be permitted to write into the customer-provided S3 bucket export location. See the RBAC Configuration section for sample policies or contact ChaosSearch for assistance with the permissions.

Run a Bulk Query Export

You can use the Bulk Query Export API endpoints to search, export, and obtain the status of the export. You can also use run and view exports from the ChaosSearch console:

To run an export from the Bulk Export area:

  1. In the console, click the user icon in the top-right corner to display the pop-up menu, then click Bulk Export.
  1. The Bulk Export page appears.
  1. Click Create Export to open the Bulk Export window.
  1. In the query pane, paste or type the Elasticsearch query that you want to run to gather all of the exported results. For example:
{
  "index": "sample-elb-view",
  "body": {
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "elb_status_code": 404
            }
          },
          {
            "range": {
              "timestamp": {
                "gte": "2003-08-10T00:00:00",
                "lte": "2023-08-10T23:59:59"
              }
            }
          }
        ]
      }
    }
  }
}
  1. Select a name for the export task, which will be used in the file naming to help identify the exports. If you omit a name, the export uses the unique ID string assigned for it.
  2. Specify a file size limit for the exported files. The default is 100 MB. Bulk export creates as many files of this maximum size as needed in the destination folder.
  3. Select a format for the exported file(s). You can choose JSON (default) or CSV.
  4. Select a compression method for the exported files. You can choose GZIP (default) or NONE.
  5. Type the cloud storage bucket destination for the exported files. For example, my-export-bucket/cloudfront-data.
  6. Click Export.
    The export process runs in the background to gather the results and export them to the target storage location. In the left menu, a unique ID appears for the export operation. Select the export and use the Export Details pane to watch its status.
    NOTE: If the export process runs and returns to the export definition window, check the syntax of your search query.

🚧

Use caution if your query result set could have many millions or billions of rows.

In this early access version, the bulk query export has been tested with search result sets that have millions and tens of millions of rows. As a good practice, keep the result set row counts below those levels.

Large exports could take many minutes, or even hours, to complete depending on the scope of the query. Use the Results pane to navigate to the storage buckets and see a list of exported files after they are available.

Important Notes

As you use bulk query export, be sure to plan for and understand the following operational behaviors.

  • Export queries should have as narrow a time range as possible. For example, avoid using very wide time ranges when the desired results range can be narrower. In some cases, a bulk export over a wide time span could require more worker resources than configured, the bulk export could fail to complete, and display a Capacity reached limit message.
  • For queries with a greater than 3-hour time range, the query used for export is "sliced" into multiple queries that run in parallel to process the export and results. In the Export Details pane, you could see the query ID changing as the export runs, and a higher than expected number of segments being scanned when compared to the query run via Discover.

πŸ“˜

Aggregations are not yet supported.

Bulk export does not yet support aggregations in the export query.

Bulk Query Export and Workers

The bulk query export uses the same pool of query workers as other ChaosSearch queries. For an occasional bulk query export, there is a small worker impact for the duration of the export. If you plan to use bulk query export on a regular basis, contact your ChaosSearch Customer Success representative to inquire about configuring dedicated workers for the export workloads.

RBAC Configuration for Bulk Query Export

To submit a bulk export query, you must have the following permissions:

  • Permission to query that view
  • elastic:* for that view

Sample "Bulk Export Admin" Group

The following permissions grant users access to submit export jobs to views they have access to and view/manage all export jobs.

[
  {
    "Actions":
    [
      "chaos:query:export:submit",
      "chaos:query:export:status",
      "chaos:query:export:cancel",
      "chaos:query:export:list"
    ],
    "Effect": "Allow",
    "Resources":
    [
      "*"
    ],
    "Version": "1.0"
  }
]

Sample "Bulk Export User" Group

The following permissions grant users access to submit export jobs for views that they have access to, into any path under s3://my-target-bucket-uuid/export/*.

[
  {
    "Actions":
    [
      "chaos:query:export:submit"
    ],
    "Effect": "Allow",
    "Resources":
    [
      "arn:aws:s3:::my-target-bucket-uuid/export/*"
    ],
    "Version": "1.0"
  }
]

Sample Bulk Export Status Group

The following group grants users access to view/manage export jobs submitted by [email protected].

[
  {
    "Actions":
    [
      "chaos:query:export:status",
      "chaos:query:export:cancel",
      "chaos:query:export:list"
    ],
    "Condition":
    {
      "Conditions":
      [
        {
          "Equals":
          {
            "chaos:owner/crn": "[email protected]"
          }
        }
      ]
    },
    "Effect": "Allow",
    "Resources":
    [
      "*"
    ],
    "Version": "1.0"
  }
]