Recommendations for AWS CloudTrail and Similar Records Arrays
Review suggested recommendations for indexing files with a large Records
-style array.
Some applications create JSON files with unique structures. For example, AWS CloudTrail log files usually begin with a Records
property that is a very large array where each element represents one API event. The CloudTrail Records
property could vary in shape (and perhaps vary in complexity) based on the types of applications the site uses.
Horizontal or Vertical Expansion
Using a horizontal expansion on a Records
attribute might not take best advantage of the flattening and ChaosSearch indexing because it creates one row with thousands, possibly millions, of columns. For this case, the ChaosSearch whitelist processing can apply a vertical expansion for the Records
attribute while all other attributes in the JSON file use horizontal expansion. A sample JSON whitelist definition with the vertical_selection_attribute
override follows:
{"vertical_selection_policy":[{"includes":["Records"],"type":"whitelist"}]}
This type of Records
array design has been observed in JSON files for applications like AWS CloudTrail, Kubernetes services, and some app logs. Carefully review the format of the JSON files that you plan to index, and work with your ChaosSearch Customer Success Representative for assistance with planning for complex, nested JSON files and flattening options for the optimal storage and filtering.
Recommendations and Settings
When creating object groups to index CloudTrail files, review the following recommendations for organizing the cloud storage files to index. These recommendations could help as a starting point for environments.
Object Group Settings
CloudTrail uses the following file name format for the log file objects that it delivers to an Amazon S3 bucket. The format of the file is json.gz
.
AccountID_CloudTrail_RegionName_YYYYMMDDTHHmmZ_UniqueString.FileNameFormat
When planning the object group Prefix value, some common directory structures use an account_ID/AWSLogs
organization to store the log files. Review your environment to see if that Prefix value or an alternative would work for an object group.
The following string is a typical object group Regex Filter
setting for CloudTrail files:
.*\/CloudTrail\/.*
Partitioning Regular Expression
If you want to use partitioning to create separate indexes separated by AWS account IDs or other characteristics, you can use an Object Filter Partition By key value similar to the following:
account_ID/AWSLogs/(\d+?)/.*
This partitioning format uses a Records.recipientAccountId
field. To work as a partition key, the field needs to be set with a Schema Filter Column Override to set its data type to a string.
Within the Schema Filter UI, it can be helpful to exclude columns that are not useful for analytics reporting. A common schema filter exclusion list follows. You can save this content as a JSON file, edit as needed if you require other exclusion rules, and then upload it to the Schema Filter window during the object group creation.
{"array_selection_policy":[{"excludes":["Records.requestParameters.HistoricalMetrics","Records.requestParameters.CurrentMetrics","Records.requestParameters.certificate.hb","Records.requestParameters.certificateChain.hb","Records.responseElements.MetricResults"],"type":"blacklist"}]}
In the Refinery area when creating a view for this CloudTrail object group, make sure to use the Schema Transformation window to transform the Records.recipientAccountId
field and set it as Treated as Partition Key.
Updated 3 months ago