Column Selection

Review this topic to learn how to include or exclude certain columns of the source files during the indexing process.

ChaosSearch has the unique ability to index every column of a cloud-storage object during the indexing process. For certain use-cases, indexing all of the columns may not be required or needed from an analysis perspective. Data analysts can use the ChaosSearch object group definition to choose the columns to include or to exclude one or more columns before initiating the indexing.

📘

NOTE:

Column selection requires a JSON file to define either a blacklist (exclusion) or whitelist (inclusion) of columns. You must have a copy of that JSON file in an accessible location from the browser.

Applying the Column Filters

After you select the cloud-storage bucket and begin the object group creation process to filter the objects to include, you can specify the columns that you want to index.

To specify the column inclusion/exclusion filter:

  1. In the object group content preview window, click Schema Filter in the top right corner. The Column Overrides window opens.
  1. In the Column Selection File Upload area, click the Drop JSON files here upload link to open a file browser, and select the JSON file with the column filters. Alternatively, you could drag and drop the JSON file into the area.
  2. Click Submit to add the filtering instructions to the object group.

When the object group is created, the Group Content page shows the updated Indexed Structure that reflects the column filtering changes from your JSON file.

Blacklist and Whitelist Examples

If there are some columns that are not expected to be used in queries, you can use a blacklist to bypass those columns in the index. Similarly, if a file has a subset of columns that are valuable for queries, but others are not, you could use a whitelist to index only the columns you want and to ignore all others.

A blacklist definition for a column contains statements similar to the following. The referenced columns will not appear in the Group Content list of columns to index. All other columns will be included/indexed.

{
  "type": "blacklist",
  "excludes": [ "column_name", "column_name2", ...  ]
}

A whitelist definition for specifying the list of columns to include/index contains statements similar to the following. Only the referenced columns will appear in the Group Content list of columns to index. Any other columns will not be included/indexed.

{
  "type": "whitelist",
  "includes": [ "column_name", "column_name2", … ]
}

There is no limit to the number of fields that you can exclude or exclude.


Did this page help you?