Recommendations for AWS CloudTrail and Similar Records Arrays

Review suggested recommendations for indexing files with a large Records-style array.

Some applications create JSON files with unique structures. For example, AWS CloudTrail log files usually begin with a Records property that is a very large array where each element represents one API event. The CloudTrail Records property could vary in shape (and perhaps vary in complexity) based on the types of applications the site uses.

Horizontal or Vertical Expansion

Using a horizontal expansion on a Records attribute might not take best advantage of the flattening and ChaosSearch indexing because it creates one row with thousands, possibly millions, of columns. For this case, the ChaosSearch whitelist processing can apply a vertical expansion for the Records attribute while all other attributes in the JSON file use horizontal expansion. A sample JSON whitelist definition with the vertical_selection_attribute override follows:

{"vertical_selection_policy":[{"includes":["Records"],"type":"whitelist"}]}

This type of Records array design has been observed in JSON files for applications like AWS CloudTrail, Kubernetes services, and some app logs. Carefully review the format of the JSON files that you plan to index, and work with your ChaosSearch Customer Success Representative for assistance with planning for complex, nested JSON files and flattening options for the optimal storage and filtering.

Recommendations and Settings

When creating object groups to index CloudTrail files, review the following recommendations for organizing the cloud storage files to index. These recommendations could help as a starting point for environments.

Object Group Settings

CloudTrail uses the following file name format for the log file objects that it delivers to an Amazon S3 bucket. The format of the file is json.gz.

AccountID_CloudTrail_RegionName_YYYYMMDDTHHmmZ_UniqueString.FileNameFormat

When planning the object group Prefix value, some common directory structures use an account_ID/AWSLogs organization to store the log files. Review your environment to see if that Prefix value or an alternative would work for an object group.

The following string is a typical object group Regex Filter setting for CloudTrail files:

.*\/CloudTrail\/.*

Partitioning Regular Expression

If you want to use partitioning to create separate indexes separated by AWS account IDs or other characteristics, you can use an Object Filter Partition By key value similar to the following:

account_ID/AWSLogs/(\d+?)/.*

This partitioning format uses a Records.recipientAccountId field. To work as a partition key, the field needs to be set with a Schema Filter Column Override to set its data type to a string.

Within the Schema Filter UI, it can be helpful to exclude columns that are not useful for analytics reporting. A common schema filter exclusion list follows. You can save this content as a JSON file, edit as needed if you require other exclusion rules, and then upload it to the Schema Filter window during the object group creation.

{"array_selection_policy":[{"excludes":["Records.requestParameters.HistoricalMetrics","Records.requestParameters.CurrentMetrics","Records.requestParameters.certificate.hb","Records.requestParameters.certificateChain.hb","Records.responseElements.MetricResults"],"type":"blacklist"}]}

In the Refinery area when creating a view for this CloudTrail object group, make sure to use the Schema Transformation window to transform the Records.recipientAccountId field and set it as Treated as Partition Key.


Did this page help you?