JSON Log and Event File Challenges
JSON log and event files are very valuable sources for information, but there are challenges for indexing and searching them efficiently
JavaScript Object Notation (JSON) is a powerful format for describing data. Many applications and platforms use JSON structured information to capture the data about their events and activities, which means there could be a wealth of valuable business insights within an organization's JSON files.
The JSON Flattening Challenge
One major challenge for indexing JSON files is efficiently indexing JSON files that have one or more, or very complex, nested JSON arrays. JSON arrays and the properties must be converted into a two-dimensional representation—like a relational table. This is referred to as flattening the JSON structured format.
Indexing flattens and stores the JSON arrays, with their attributes and values, in a format that resembles a table of rows and columns. For complex JSON arrays in files, the resulting indexed data could be converted into a significant number of rows and fields, and could require a large amount of storage resources to hold the flattened data. This is often referred to as the JSON permutation explosion.
Horizontal and Vertical Expansion Options
There are two types of expansion methodologies used for flattening JSON arrays:
- Vertical expansion flattens arrays into separate rows of values.
- Horizontal expansion flattens arrays into one row with columns of values.
The following image shows how expansion flattens arrays into rows and columns:

Array Depth Expansion
JSON arrays can have nested fields with several depth levels. So, a JSON array might have a structure similar to the following skeleton, with arrays and properties nested inside other arrays.
"array1": [
{
"field": "value",
"array2": [
{
"array3": [
{
"array4": [
{
"arrayx": [
{
...
The nested array depth level can contribute significantly to the permutation explosion when flattening these JSON files. Tests with JSON log files from some common application services show that one highly nested JSON array could vertically flatten to millions of indexed rows—or horizontally to one row with millions of columns—and some columns could be very wide if the expansion tools have the ability to convert deeply nested arrays and properties into a contiguous JSON string of native JSON properties to limit the permutation.
Some expansion techniques can convert deeply nested arrays into a string blob of the native JSON properties, and help to minimize the expansion of rows and columns. For many analysis tools, JSON strings are limited in their use for analytics. JSON strings can be searched like other string values, but any properties embedded in those strings are usually not accessible for finer-grained use like column filters.
Vertical Expansion, Arrays, and Numeric Fields
If you use vertical expansion, be sure to watch for JSON structures with arrays and the effect on numeric fields in the data. For example, when you use vertical expansion for JSON structures that have a format like this:
{
"bytes": 123,
"name": "string",
"tags": [
"success",
"info"
]
}
Vertical expansion of the tags
array will create two rows where some fields are duplicated in order to flatten the array:
123 string success
123 string info
For the bytes
and name
field content, the expanded rows can affect aggregations such as count() or sum() functions. For example, because there are two rows with the same bytes
value, a sum of the bytes
column would result in a double-counted value of 246, not 123. If the string
value is also used for count aggregations, the extra row results in a double value for the count aggregation. To avoid the vertical row expansion impact to these types of fields, you could flatten the tags
array to a JSON string to avoid the extra rows, or if tags
is not important for analysis, you could exclude the array from indexing to avoid the multiple-row expansion it caused.
Avoiding the Planning, Time, and Cost Hurdles
JSON files are very valuable sources of information, but the permutation explosion, storage requirements, and the options for managing nested arrays can present daunting challenges. Administrators could spend significant time and energy to study the JSON files and their content, and to try different alternatives like re-pipelining source files to weed out unnecessary or problematic content—and to start over again if the resulting subset is still too complex, or if some changes dropped valuable data needed for analysis.
When JSON files are in the indexing mix, ChaosSearch offers a much easier solution for simplifying the indexing and analysis of complex JSON files—JSON Flex®.
Updated 7 months ago