Bulk Query Export
Run an Elasticsearch/Discover search in the background and export an unlimited, complete result set to S3 for later review.
You can use bulk query export to submit batch jobs that run an Search Analytics/Elasticsearch Discover query and incrementally write the complete result set to a series of files in a designated S3 bucket.
When you run Search Analytics > Discover from the UI, the Discover results show the number of matching results (hits), but the Discover result set in the browser (and which can be exported to PDF or CSV) is limited in the user interface to a default of 500 rows to manage the memory needed to process the result set.
The bulk export captures the complete result set, without imposing the browser row ceiling. You can review the progress of the export in the Bulk Export UI, and after the query and export has completed, you can review and download the results files from your S3 bucket.
Make sure to plan for the proper configuration of access policies on the customer/user account.
The ChaosSearch role must be permitted to write into the customer-provided S3 bucket export location. See the RBAC Configuration section for sample policies or contact ChaosSearch for assistance with the permissions.
Run a Bulk Query Export
You can use the Bulk Query Export API endpoints to search, export, and obtain the status of the export. You can also use run and view exports from the ChaosSearch console:
- From a Search Analytics > Discover window (see Bulk Query Export in Search Analytics)
- From the Bulk Export UI area, described in this topic
To run an export from the Bulk Export area:
- In the console, click the user icon in the top-right corner to display the pop-up menu, then click Bulk Export.
- The Bulk Export page appears.
- Click Create Export to open the Bulk Export window.
- In the query pane, paste or type the Elasticsearch query that you want to run to gather all of the exported results. Search query clauses, metric aggregations (except for percentiles aggregations), and bucket aggregations are supported. A simple example follows:
{
"index": "sample-elb-view",
"body": {
"query": {
"bool": {
"must": [
{
"match": {
"elb_status_code": 404
}
},
{
"range": {
"timestamp": {
"gte": "2003-08-10T00:00:00",
"lte": "2023-08-10T23:59:59"
}
}
}
]
}
}
}
}
- Select a name for the export task, which will be used in the file naming to help identify the exports. If you omit a name, the export uses the unique ID string assigned for it.
- Select a format for the exported file(s). You can choose JSON (default) or CSV.
- Specify a file size limit for the exported files. The default is 100 MB. Bulk export creates as many files of this maximum size as needed in the destination folder.
- Select a compression method for the exported files. You can choose GZIP (default) or NONE.
- Specify the cloud storage bucket destination for the exported files. For example,
my-export-bucket/cloudfront-data
. - Click Export.
The export process runs in the background to gather the results and export them to the target storage location. In the left menu, a unique ID appears for the export operation. Select the export and use the Export Details pane to watch its status.
NOTE: If the export process runs and returns to the export definition window, check the syntax of your search query.
Use caution if your query result set could have many millions or billions of rows.
In this early access version, the bulk query export has been tested with search result sets that have millions and tens of millions of rows. As a good practice, keep the result set row counts below those levels.
Large exports could take many minutes, or even hours, to complete depending on the scope of the query. Use the Results pane to navigate to the storage buckets and see a list of exported files after they are available.
Important Notes
As you use bulk query export, be sure to plan for and understand the following operational behaviors.
- Export queries should have as narrow a time range as possible. For example, avoid using very wide time ranges when the desired results range can be narrower. In some cases, a bulk export over a wide time span could require more worker resources than configured, the bulk export could fail to complete, and display a
Capacity reached limit
message. - For queries with a greater than 3-hour time range, the query used for export is "sliced" into multiple queries that run in parallel to process the export and results. In the Export Details pane, you could see the query ID changing as the export runs, and a higher than expected number of segments being scanned when compared to the query run via Discover.
Percentiles Aggregations are not yet supported.
Bulk export does not yet support percentiles aggregations in the export query.
Bulk Query Export and Workers
The bulk query export uses the same pool of query workers as other ChaosSearch queries. For an occasional bulk query export, there is a small worker impact for the duration of the export. If you plan to use bulk query export on a regular basis, contact your ChaosSearch Customer Success representative to inquire about configuring dedicated workers for the export workloads.
RBAC Configuration for Bulk Query Export
To submit a bulk export query, you must have the following permissions:
- Permission to query that view
elastic:*
for that view
Sample "Bulk Export Admin" Group
The following permissions grant users access to submit export jobs to views they have access to and view/manage all export jobs.
[
{
"Actions":
[
"chaos:query:export:submit",
"chaos:query:export:status",
"chaos:query:export:cancel",
"chaos:query:export:list"
],
"Effect": "Allow",
"Resources":
[
"*"
],
"Version": "1.0"
}
]
Sample "Bulk Export User" Group
The following permissions grant users access to submit export jobs for views that they have access to, into any path under s3://my-target-bucket-uuid/export/*
.
[
{
"Actions":
[
"chaos:query:export:submit"
],
"Effect": "Allow",
"Resources":
[
"arn:aws:s3:::my-target-bucket-uuid/export/*"
],
"Version": "1.0"
}
]
Sample Bulk Export Status Group
The following group grants users access to view/manage export jobs submitted by [email protected]
.
[
{
"Actions":
[
"chaos:query:export:status",
"chaos:query:export:cancel",
"chaos:query:export:list"
],
"Condition":
{
"Conditions":
[
{
"Equals":
{
"chaos:owner/crn": "[email protected]"
}
}
]
},
"Effect": "Allow",
"Resources":
[
"*"
],
"Version": "1.0"
}
]
Updated 3 months ago