Indexing AWS CloudTrail JSON Files

Suggestions and good practices for indexing and querying CloudTrail JSON log files

AWS CloudTrail is an AWS service that helps users with operational and risk auditing, governance, and compliance of their AWS account and its related actions. Actions taken by a user, role, or AWS service are recorded as events in CloudTrail, and for users that create a "trail", the events are output as JSON log files to an Amazon S3 bucket for the account.

JSON Flex is and its options for indexing, selecting, and Refinery view transforms are uniquely well suited for indexing and searching CloudTrail JSON log files.

About CloudTrail Event Files

AWS CloudTrail log files usually begin with a top-level Records array that contains fields for one or more events. The Records structure can vary in structure, fields, and complexity, based on the log file setup and the applications or services that the site uses.

A sample CloudTrail event log has the following structure. The example shows that an IAM user named Alice used the AWS CLI to call the Amazon EC2 StartInstances action by using the ec2-start-instances command for instance i-ebeaf9e2.

{"Records": [{
    "eventVersion": "1.0",
    "userIdentity": {
        "type": "IAMUser",
        "principalId": "EX_PRINCIPAL_ID",
        "arn": "arn:aws:iam::123456789012:user/Alice",
        "accessKeyId": "EXAMPLE_KEY_ID",
        "accountId": "123456789012",
        "userName": "Alice"
    },
    "eventTime": "2014-03-06T21:22:54Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "StartInstances",
    "awsRegion": "us-east-2",
    "sourceIPAddress": "205.251.233.176",
    "userAgent": "ec2-api-tools 1.6.12.2",
    "requestParameters": {"instancesSet": {"items": [{"instanceId": "i-ebeaf9e2"}]}},
    "responseElements": {"instancesSet": {"items": [{
        "instanceId": "i-ebeaf9e2",
        "currentState": {
            "code": 0,
            "name": "pending"
        },
        "previousState": {
            "code": 80,
            "name": "stopped"
        }
    }]}}
}]}

👍

About the CloudTrail Records Contents

For an overview of the record format and fields, see the Amazon Web Services online help for the CloudTrail record contents.

More information on the CloudTrail log files and service is available in the AWS Working with CloudTrail log files.

As shown in the sample, the Records array is usually at the top level and contains varying levels of nested arrays and objects. The contents of the Records array can differ for various types of applications, services, or configuration settings, but there are some common characteristics found in many CloudTrail files. It can be helpful to review the shape and content of the CloudTrail logs in use at your site.

The following topics describe some recommended JSON Flex techniques that can improve indexing and the querying and visualization behaviors for CloudTrail logs.