Column Selection

Review this topic to learn how to include, exclude, and customize selection policy rules for the content of the source files during object group definition and indexing.

ChaosSearch offers administrators several policy and processing options to control how to index their source file data. The following table lists the selection policy types, and more information is available after the table.

Whitelist/Blacklist Options

Options that control which fields in the source files to include or exclude from the object group and the generated indexed data for those source files.

JSON Array Options

For JSON log and event files that have nested arrays, option to indexing options to vertically expand an array of interest for indexing and analytics, or to specify an array and its properties that you want to store as a JSON string.

JSON Nested Object Options

For JSON log and event files with nested objects, option to [select one or more nested fields] and its properties (doc:column-selection#json-nested-object-options) that you want to index as a JSON string.

📘

NOTE:

The Column Selection user interface requires you to create and import a JSON file that contains the settings for the field selections and processing rules. That JSON file must be stored in a file location that you can access and import from the browser.

Applying a Column Selection Filter

To specify the column selection filter for an object group:

  1. In the object group content preview window, click Schema Filter in the top right corner.
23282328
  1. In the Column Overrides window, go to the Column Selection File Upload area at the bottom. This is where you can upload a column selection JSON file that has one or more of the selection policies defined.
11081108
  1. In the Column Selection File Upload area, click the Drop JSON files here upload link to open a file browser, and select the JSON file with the column filters. Alternatively, you could drag and drop the JSON file into the area.
  2. Click Submit to add the filtering instructions to the object group.

📘

The Submit button appears to hang if the JSON file has errors.

When you create the JSON file for the column selection policies that you want to use, make sure that you verify that you have a properly configured JSON file. It can be helpful to validate the file using a third-party or public JSON validator. If you click Submit and the window appears to hang, the problem is usually a formatting error in the JSON file. Verify the file, fix any format errors if needed, and re-submit the valid JSON file.

When the object group is created, ChaosSearch applies the selection rules that you have specified to the resulting fields in the index schema. You must index the object group to see the effect of your policies on the resulting fields. It can be helpful to index a small set of files and to review the Group Content tab information to confirm that the policies are taking effect as expected.

Field Inclusion and Exclusion Policies

As a default behavior, ChaosSearch indexes all of the fields in source log and event files that match the object group content filters. Sometimes log and event files contain fields that are not meaningful for analytics or queries. Rather than index information that is not valuable, it is a good practice to exclude any unnecessary fields.

👍

While it is a common practice to remove undesired content in the log creation phase (before files are written to cloud storage), that content cleanup is not always possible or easy. Changing the log content can be complex, require iterations, and could become a recurring task. Fields deemed unnecessary for one form of analysis might be useful for another, adding duplication and rework to get the right content into the logs.

The ChaosSearch inclusion and exclusion indexing policies can help to filter the log and event source data to only what is needed, reducing or eliminating the need for costly rework of the source files. The resulting ChaosSearch indexed data for the group is more tailored and compact, leading to better query performance and end-user information experience.

The column selection whitelist and blacklist policies define the rules for including or excluding fields that exist within the source log files.

If most of the source file fields are useful for indexing and analysis, you can use a blacklist policy to exclude specific fields from the indexed data. The blacklist can contain one or more fields, and the index process will skip (not index) those fields. Similarly, if the source log and event files have only a subset of fields that are valuable for queries, you could use a whitelist to index only those fields and omit all others in the indexed data and as columns in the Refinery views for that object group.

A sample blacklist definition contains statements similar to the following. In this example, the referenced fields (field_name and field_name2) of the source files will not be included in the indexed data as queryable columns. All other fields will be included/indexed.

{
  "type": "blacklist",
  "excludes": [ "field_name", "field_name2", ...  ]
}

A whitelist definition for specifying a list of fields to include/index contains statements similar to the following. Only the referenced fields in the whitelist (field_name and field_name2) will be defined in the object group and indexed data. All other fields in the source log and event files will be omitted from the object group.

{
  "type": "whitelist",
  "includes": [ "field_name", "field_name2", … ]
}

There is no limit to the number of fields that you can include or exclude.

For JSON files, make sure that you capture the full and correct field name, which might require a property,prop1[.prop2...] pathname. it can be helpful to create a test object group for a small sample file to correctly identify the field names in a sample resulting object group to ensure that you have the correct field names.

JSON Array Options

For JSON files with nested arrays, you use the JSON Flex options to specify how to index the arrays and their nested properties. For example, you can use horizontal or vertical expansion, and you can specify an array level (that is, how many nested levels) of properties to index as separate fields. See JSON Flex Advantage for more information about JSON files and the available ChaosSearch optimizations for JSON file indexing.

For JSON files that have nested arrays and that are indexed using horizontal flattening, you can use thevertical_selection_policy to specify one or more array fields to vertically expand to use them for analytics and filtering controls within searches and visualizations.

A sample JSON vertical_selection_policy definition follows, where an array called app_status will be vertically expanded during indexing:

{"vertical_selection_policy":[{"includes":["app-status"],"type":"whitelist"}]}

For JSON files that are indexed vertically to make nested properties fully indexed and available for analytics and filter controls, you can use array_selection_policy to specify one or more arrays to ignore for vertical expansion; that is, the specified array and its entire contents will be stored as a JSON string in the indexed field. The JSON string content is searchable for analytics and querying.

A sample array_selection_policy follows, where the Records.requestParameters array will be indexed as a JSON string, not as individual fields and rows.

{"array_selection_policy":[{"excludes":["Records.requestParameters"],"type":"blacklist"}]}

JSON Nested Object Options

Nested objects are indexed as separate properties, which could result in a large number of separate fields in the indexed data. Some nested objects and properties might be better suited and more efficient to store as JSON strings versus separate fields.

The field_selection_policy allows you to specify one or more JSON objects. The contents of that object including any layers of nested properties below it will be indexed as one contiguous JSON string in a field. The JSON string content will be text searchable like any string field, with the benefit of a reduced number of indexed data fields. With field selection, you can take advantage of the regex type to use a regular expression for very granular control for specifying each field of interest.

Sample field selection policy definitions follow. The first example uses regex patterns to specify the fields to treat as JSON strings. The second example selects fields that can be specified without the need for regular expression patterns, so that example uses the "excludes": operation with "type": "blacklist" to achieve the same behavior.

{
"field_selection_policy": [
    {
      "include": false,
      "patterns": [
        "^field[.]prop[.][A-Fa-f0-9]{64}.*",
        "^property[.][A-Fa-f0-9]{64}.*"
      ],
      "type": "regex"
    }
  ]
  
  "field_selection_policy": [
    {
      "excludes": [
        "plan.jobs",
        "unusable_job_outputs"
      ],
      "type": "blacklist"
    }
  ]
}
}

👍

Materialize with JSON

The JSON fields and properties that are converted to JSON string blobs normally cannot be used for analytics as filters or metrics for aggregations, however ChaosSearch offers a Refinery view technique to materialize a property inside a JSON blob for analytics use.

When you create the Refinery view, you can use the Materialize with JSON transform to specify an embedded JSON property by its JSON path, and the property will become available for analytics and filtering as a post-analytics, schema-on-read, materialized column.


Did this page help you?