Indexing Your Data

Index your cloud storage objects to create the ChaosSearch data for visualizing and analyzing the information in the content.

ChaosSearch indexing is a deep data analysis of your cloud object storage content based on the rules defined in an object group. After you create an object group, start indexing to create the Chaos Index® daily intervals and indexed data that provide a lossless representation of your cloud-storage files.

Start Indexing

To start the indexing process:

  1. In the Storage area, select the object group that you want to start indexing.
  2. Click Start Indexing in the top right. (The button could be Restart Indexing if some daily intervals already exist.)

Chaos Index starts a deep analysis of associated content, generating a proprietary data representation, and inferring the data structure (schema). The end result is one or more daily intervals appearing in the Intervals tab.

View Index Information

Once the indexing analysis is complete, the system displays a comprehensive report of your data. The Properties page is updated with a summary of the indexing Status, and a Data Types pie chart to summarize the fields that were discovered and created for the indexed data.

The Indexed Structure area summarizes the field information with a list of field names and data types.

Indexing Options

Live Indexing

The Live indexing option causes ChaosSearch to index the existing content in the cloud storage bucket that matches the object group file filtering criteria, and then to watch for and index any new matching files after they are written to the storage bucket. Live indexing is the typical option for most environments where the cloud storage bucket receives new objects to index on a regular cadence.

Live indexing requires the configuration of an Amazon Web Services (AWS) SQS integration or Google Pub/Sub integration (depending on the source cloud storage service) to notify the indexing service when new objects are available. For more information, see Live Indexing - Amazon SQS or Live Indexing - Google PubSub.

When you define the object group for the files that you want to index, ChaosSearch allows you to specify how to index your data. You first specify the file patterns, define any needed object and/or schema filters and column transformations, then you reach the Create Object Group window.

To configure Live Indexing, select the Live Indexing option and paste in the ARN created for the AWS SQS service, or the GCP Pub/Sub Project ID. A sample window follows:

If you do not select Live Indexing, the indexing style defaults to static indexing. Static indexing causes the system to index the matching objects defined by the object group only when you click Start Indexing. It assumes that the bucket contents are relatively static. If new objects are added to the bucket, you can click Start Indexing again to run a new static index for any new files added since the last time the index service ran.

Within the Create Object Group window, you can choose a Retention Policy for Index Lifecycle Management.