When you create an object group in ChaosSearch Storage, and you select one or more JSON files to index, the object group Content Preview window displays information and indexing options. A sample Content Preview window follows:
As shown in the highlighted summary row:
The Format of the selected sample file is JSON.
Compression shows whether the JSON file is not compressed (NONE), or if GZIP or SNAPPY compression is detected.
The Array Flatten Depth controls how deeply into nested arrays you want to flatten the constituent properties using the designated Expansion setting of Horizontal or Vertical. If you choose none or a flattening depth that is less than UNLIMITED, each array nesting levels that is not flattened horizontally or vertically will be stored as a string field of JSON properties and values in its raw JSON format.
The default of UNLIMITED specifies that all nested arrays are expanded to their constituent attributes and values. This is typically a better option for horizontal expansion.
A depth of NONE specifies that all nested arrays are stored as a JSON text string (sometimes called a JSON blob). Like any string value, the JSON string supports full text search queries. This can be helpful in cases where JSON nested array attributes might be meaningful only for text searches.
A number from 1-10 specifies that arrays at or above the specified level will be horizontally or vertically expand (per the group setting), and would be available as filters and columns for analytics, but any lower-level nested arrays are saved as JSON strings. For example, a value of 2 specifies that arrays at level 1 and level 2 will be expanded, but arrays at levels 3 and below will each be stored and indexed as a JSON string field.
Expansion and array flattening can trigger a JSON permutation explosion (and storage impact) that creates too many indexed fields (columns) for complex JSON files, especially those that have deeply nested properties. Object groups have column limits to help protect against enormous index schemas that could be too large to be queryable in reasonable timeframes. Indexing for an object group will stop and report a service limit error if the indexed schema reaches the system's configured column limit.
As a best practice, carefully review JSON source file structures, the properties and data of analytics interest, and how users might plan to query or visualize that data. The valuable information could be a subset of the content of the JSON source files. With a map of the important business data columns, you can take advantage of the ChaosSearch JSON indexing and transformation features to create more compact indexes that are faster to augment when new files arrive, and more efficient to store, query, and use for analytics.
- Expansion is the flattening method, either Vertical or Horizontal.
Vertical expansion is a flattening that typically benefits analytics by creating an index record for each JSON array member and nested array member (depending on the flatten depth). The resulting index supports more flexibility for analytics at attribute levels, but at the expense of greater indexing resources and storage for the increased number of flattened records. This is typically the better option for JSON files that do not have very complex nesting.
Horizontal expansion is a flattening that typically benefits storage space by creating one record with many columns for each corresponding JSON record. This is typically the better option for JSON files that use complex nested arrays.
- Schema Filter opens a window with features for virtually transforming the data type of a column, or for indexing (whitelisting) or excluding (blacklisting) JSON attributes from the indexed files. For a JSON file, the whitelist feature can be helpful to define special expansion rules for one or more attributes/columns in a JSON file when all the others use the opposite expansion. See Recommendations for AWS CloudTrail and Similar Records Arrays for an example.
Some Recommendations for Object Groups
- If you are indexing a JSON file that is relatively flat (that is, no complex arrays or nested arrays), vertical expansion typically offers the best index flattening and analytic balance.
- If you are indexing a JSON file that has objects with nested arrays, horizontal expansion is usually the better option to take advantage of storage efficiency.
Updated 23 days ago