JSON Log Flattening Example

A simple example of how a JSON log flattens with horizontal and vertical expansion

In a simplified view, the CloudTrail log used in this example has a JSON format similar to the following. For brevity, this log shows the structure of 2 of the 5 sample events, and some ID fields have compact values:

{
   "Records":[
      {
         "eventVersion":"1.05",
         "userIdentity":{
            "type":"Root",
            "principalId":"EXAMPLEPRINC",
            "Arn":"arn:aws:iam::EXAMPLE:root",
            "accountId":"ACCOUNT",
            "accessKeyId":"KEYID",
            "sessionContext":{
               "attributes":{
                  "mfaAuthenticated":"true",
                  "creationDate":"2018-08-13T11:40:45Z"
               }
            }
         },
         "eventTime":"2018-08-13T16:40:21Z",
         "eventSource":"cloudtrail.amazonaws.com",
         "eventName":"DescribeTrails",
         "awsRegion":"ap-northeast-1",
         "sourceIPAddress":"10.1.207.221",
         "userAgent":"console.amazonaws.com",
         "requestParameters":{
            "trailNameList":[
               
            ],
            "includeShadowTrails":false
         },
         "responseElements":null,
         "requestID":"b4c3f3d6EXAMPLE",
         "eventID":"EVENT",
         "eventType":"AwsApiCall",
         "recipientAccountId":"ACCOUNT"
      },
      {
         "eventVersion":"1.05",
         "userIdentity":{
            "type":"Root",
            "principal Id":"EXAMPLEPRINC",
            "arn":"arn:aws:iam::EXAMPLE:root",
            "accountId":"ACCOUNT",
            "accessKeyId":"KEYID",
            "sessionContext":{
               "attributes":{
                  "mfaAuthenticated":"true",
                  "creationDate":"2018-08-13T11:40:45Z"
               }
            }
         },
         "eventTime":"2018-08-13T16:40:22Z",
         "eventSource":"cloudtrail.amazonaws.com",
         "eventName":"GetTrailStatus",
         "awsRegion":"ap-northeast-1",
         "sourceIPAddress":"10.1.207.221",
         "userAgent":"console.amazonaws.com",
         "requestParameters":{
            "name":"arn:aws:cloudtrail:us-east-1:EXAMPLE:trail/cheslock-cloudtrail"
         },
         "responseElements":null,
         "requestID":"b4c3f3d6EXAMPLE",
         "eventID":"EVENT",
         "eventType":"AwsApiCall",
         "recipientAccountId":"ACCOUNT"
      },
 …

When this sample JSON log is horizontally expanded with an unlimited array flattening depth, the Records property and its constituent properties and nested arrays compact to 1 row with 104 columns.

0

1

2

3

4

5

...

102

103

Each column represents a JSON file property and its value. For each log event (5 in this sample file), the flattened columns include an ID (0 to 4) to associate each column within its event. When JSON files have nested objects and arrays, the column name reflects the nesting structure after the event ID. For example, some of the column names appear as follows:

Records.0.eventVersion
Records.0.userIdentity.type
Records.0.userIdentity.principalId
Records.0.userIdentity.arn
Records.0.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.0.userIdentity.sessionContext.attributes.creationDate
Records.0.eventTime
Records.0.eventSource
Records.0.eventName
…
Records.1.eventVersion
Records.1.userIdentity.type
Records.1.userIdentity.principalId
Records.1.userIdentity.arn
Records.1.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.1.userIdentity.sessionContext.attributes.creationDate
Records.1.eventTime
Records.1.eventSource
Records.1.eventName
…

A disadvantage of the horizontal flattening is that nested properties are indexed as their own field. That is, Records.0.eventVersion is a field with only that one value. When translated to the Refinery view, there will be many Record.x.eventVersion columns, which means that a user cannot select a general eventVersion column to see or filter from a list of all available version values.

When the sample JSON log is vertically expanded with an unlimited array flattening depth, the JSON properties, nested properties, and arrays are flattened to one row for each event with its properties as a column in each row. In the sample CloudTrail log, there are 5 events, and so 5 rows, each with 24 columns.

0

...

21

21

23

1

...

21

22

23

2

...

21

22

23

3

...

21

22

23

4

...

21

22

23

The columns for each row show the main array (Records) and the properties (including any nesting levels), as in the following examples. In a vertical case, the column names do not include an ID to associate the property columns since they are already associated within the same row:

Records.awsRegion
Records.eventID
Records.eventName
Records.eventSource
Records.eventTime
Records.eventType
Records.eventVersion
Records.recipientAccountId
Records.requestID
Records.requestParameters.includeShadowTrails
Records.requestParameters.name
Records.sourceIPAddress
Records.userAgent
Records.userIdentity.accessKeyId
Records.userIdentity.accountId
Records.userIdentity.arn
Records.userIdentity.principalId
Records.userIdentity.sessionContext.attributes.creationDate
Records.userIdentity.sessionContext.attributes.mfaAuthenticated
Records.userIdentity.type

To continue the eventVersion example: With vertical flattening, there is one eventVersion column that contains all the version values, and it could be used for filtering and analytics.

👍

Visible Columns

The indexed data tables also include internal columns for metadata like row IDs for example. The internal columns are omitted for this example.

Effects on Storage Space

For complex, nested JSON files, the index for a file that is expanded horizontally is usually a fraction of the size of the index for the same file when it is expanded vertically. The format and size of the original JSON file has an impact on the storage size differences. In some cases such as very small JSON files, or for some types of JSON structure, the storage difference for horizontal and vertical expansion might be negligible. In other cases, the differences could be significant.

It is important to test with sample files that are representative of the data shape and size that you expect to see in the live environment to review the benefits of the various expansion and flattening depth options.

With JSON files and nested properties and arrays, it is often the case that one solution size will not fit all, nor even most of the indexing needs. ChaosSearch has some additional solutions that can help to tune the indexing and the analysis design to your specific indexing needs.


Did this page help you?