JSON Log Flattening Example

A simple example of how a JSON log flattens with horizontal and vertical expansion

In a simplified view, the CloudTrail log used in this example has a JSON format similar to the following. For brevity, this log shows the structure of 2 of the 5 sample events, and some ID fields have compact values:

{
   "Records":[
      {
         "eventVersion":"1.05",
         "userIdentity":{
            "type":"Root",
            "principalId":"EXAMPLEPRINC",
            "Arn":"arn:aws:iam::EXAMPLE:root",
            "accountId":"ACCOUNT",
            "accessKeyId":"KEYID",
            "sessionContext":{
               "attributes":{
                  "mfaAuthenticated":"true",
                  "creationDate":"2018-08-13T11:40:45Z"
               }
            }
         },
         "eventTime":"2018-08-13T16:40:21Z",
         "eventSource":"cloudtrail.amazonaws.com",
         "eventName":"DescribeTrails",
         "awsRegion":"ap-northeast-1",
         "sourceIPAddress":"10.1.207.221",
         "userAgent":"console.amazonaws.com",
         "requestParameters":{
            "trailNameList":[
               
            ],
            "includeShadowTrails":false
         },
         "responseElements":null,
         "requestID":"b4c3f3d6EXAMPLE",
         "eventID":"EVENT",
         "eventType":"AwsApiCall",
         "recipientAccountId":"ACCOUNT"
      },
      {
         "eventVersion":"1.05",
         "userIdentity":{
            "type":"Root",
            "principal Id":"EXAMPLEPRINC",
            "arn":"arn:aws:iam::EXAMPLE:root",
            "accountId":"ACCOUNT",
            "accessKeyId":"KEYID",
            "sessionContext":{
               "attributes":{
                  "mfaAuthenticated":"true",
                  "creationDate":"2018-08-13T11:40:45Z"
               }
            }
         },
         "eventTime":"2018-08-13T16:40:22Z",
         "eventSource":"cloudtrail.amazonaws.com",
         "eventName":"GetTrailStatus",
         "awsRegion":"ap-northeast-1",
         "sourceIPAddress":"10.1.207.221",
         "userAgent":"console.amazonaws.com",
         "requestParameters":{
            "name":"arn:aws:cloudtrail:us-east-1:EXAMPLE:trail/cheslock-cloudtrail"
         },
         "responseElements":null,
         "requestID":"b4c3f3d6EXAMPLE",
         "eventID":"EVENT",
         "eventType":"AwsApiCall",
         "recipientAccountId":"ACCOUNT"
      },
 …

When this sample JSON log is horizontally expanded with an unlimited array flattening depth, the Records property and its constituent properties and nested arrays compact to 1 row with 104 columns.

0

1

2

3

4

5

...

102

103

Each column represents a JSON file property and its value. For each log event (5 in this sample file), the flattened columns include an ID (0 to 4) to associate each column within its event. When JSON files have nested properties and arrays of properties, the column name reflects the nesting structure after the event ID. For example, some of the column names appear as follows:

Records.0.eventVersion
Records.0.userIdentity.type
Records.0.userIdentity.principalId
Records.0.userIdentity.arn
Records.0.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.0.userIdentity.sessionContext.attributes.creationDate
Records.0.eventTime
Records.0.eventSource
Records.0.eventName
…
Records.1.eventVersion
Records.1.userIdentity.type
Records.1.userIdentity.principalId
Records.1.userIdentity.arn
Records.1.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.1.userIdentity.sessionContext.attributes.creationDate
Records.1.eventTime
Records.1.eventSource
Records.1.eventName
…

When the sample JSON log is vertically expanded with an unlimited array flattening depth, the JSON properties, nested properties, and arrays are flattened to one row for each event with its properties as a column in each row. In the sample CloudTrail log, there are 5 events, and so 5 rows, each with 24 columns.

0

...

21

21

23

1

...

21

22

23

2

...

21

22

23

3

...

21

22

23

4

...

21

22

23

The columns for each row show the main array (Records) and the properties (including any nesting levels), as in the following examples. In a vertical case, the column names do not include an ID to associate the property columns since they are already associated within the same row:

Records.awsRegion
Records.eventID
Records.eventName
Records.eventSource
Records.eventTime
Records.eventType
Records.eventVersion
Records.recipientAccountId
Records.requestID
Records.requestParameters.includeShadowTrails
Records.requestParameters.name
Records.sourceIPAddress
Records.userAgent
Records.userIdentity.accessKeyId
Records.userIdentity.accountId
Records.userIdentity.arn
Records.userIdentity.principalId
Records.userIdentity.sessionContext.attributes.creationDate
Records.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.userIdentity.type

👍

Visible Columns

The indexed data tables also include internal columns for metadata like row IDs for example. The internal columns are omitted for this example.

Effects on Storage Space

For complex, nested JSON files, the index for a file that is expanded horizontally is usually a fraction of the size of the index for the same file when it is expanded vertically. The format and size of the original JSON file has an impact on the storage size differences. In some cases such as very small JSON files, or for some types of JSON structure, the storage difference for horizontal and vertical expansion might be negligible. In other cases, the differences could be significant.

It is important to test with sample files that are representative of the data shape and size that you expect to see in the live environment to review the benefits of the various expansion and flattening depth options.


Did this page help you?