Step 2. Define Object Groups

Create object groups to organize and associate similar objects (files) in your cloud storage buckets for ChaosSearch indexing.

Cloud storage, and data lakes, often contain a massive variety of objects and files. Object groups are a virtual filter for your cloud-storage buckets that filter on specific objects that you want to index using ChaosSearch. Each object group can focus on specific files, and you define some rules for how to index that particular content. ChaosSearch will take it from there, and use those object group rules to index your data as you want.


Object Group Planning‐Keeping Analysis Simple

As a good practice, start slowly; create an object group to filter and index some of your object files. Review the object group content information to examine the structure of the index, and to see if the field names for the data in the source files are intuitive. With ChaosSearch, is it easy to include or exclude fields, change field names, set data types of fields, and control the content of the indexed data within ChaosSearch, reducing the churn that is often needed to recreate source content if the fields are not correct, complete, or useful for your data analysis requirements.

Evaluating and cleaning the indexed data works most smoothly when you can work with small representative samples to develop templates and rules for the types of content at your site. You can tune the indexing controls with a small sample of files before you apply it to a wider set of storage files (and a larger resulting index that takes more time to create).

It is very helpful to survey the contents of your cloud storage buckets before you start creating object groups. An inventory of the storage file types, whether there are visibility or security concerns for the files (such as who needs access to the the indexed data for the files), and factors like frequency of updates for new related files can help drive good object group design. Start with a set of storage objects that contain valuable data for visualization by your analysts, and define an object group to index that data. That process often results in tuning and changes, and leads to Refinery view planning and design and possible materialization (that is, post-processing transformations of the indexed data to create even more refined columns for analysis and visualization).

Plan some cleanup tasks to remove any early object groups (and indexes) that are no longer useful. Too many object groups, especially if they overlap or are stale and unused, could be confusing for the analytics users who create Refinery views, and less efficient for the site.

Select a bucket, review its contents for file format types and the content previews, and then click Create Object Group to begin. The Object Group Preview window appears


See Creating Object Groups for the detailed instructions on how to create and manage object groups. The primary goal of the group preview window is to use the available options for file name prefixes, regular expression strings, and/or object filter controls to pinpoint the file(s) that you want to include in the object group for indexing.

After you have identified the files for the group, click Next to display a Content Preview window.

Content Preview Window

The content preview summarizes the format of the included files (such as log, JSON, CSV, or unknown) and the compression types (such as none, GZIP, or snappy). ChaosSearch can provide a content preview even if the data files are compressed. This allows you to stay in the window while constructing regular expressions to parse the fields for indexing.

Depending on the format of the files you selected, there might be options for delimiter values of CSV values or flattening options for JSON files. Click Schema Filter for the ability to override the data type for one or more fields (to change the type to string, number, period, or time value). You can create a JSON file of field selection and processing policies to more tightly control the fields that will be indexed with inclusion/exclusion rules, JSON file processing options, and similar controls.


Object Group Indexing Controls

After you specify the field content and controls, click Create Object Group. The final step for an object group is to name it, and to specify the indexing options.


For each object group, you can choose whether you want ChaosSearch to run an on-demand index once (called a static index), or whether to use live indexing to look for and index new matching files as they are written to your cloud storage locations. With live indexing, the storage system sends ChaosSearch a notification via AWS SQS or Google Pub/Sub messaging (based on the configured account) to report when new files are available for indexing.


Index Intervals

At this time, ChaosSearch supports only a daily interval for the index files that it creates.

For each object group, you can control how long to keep the daily index files before they age out of the system. You can configure the index lifecycle to keep the index files for an object group for as long as your users need them for visualization and analytics. The default is 14 days, but you could set a shorter or longer (even unlimited) duration to keep the files.

Create the Group and Start Indexing

When you create your object group, you can choose to run the indexing later (Create) or immediately when the object group is added (Create and Start). The Create option is the most common case. If you chose the Create option, click Start Indexing to run the indexing services. Indexing performs a deep analysis of the files specified by the object group, and includes any field overrides and filters in the resulting indexed data.


After you start indexing an object group, the Group Contents tab updates to show more information about the index, and the index structure (fields and types). The pie chart summarizes the types.


After you create and index an object group, create a Refinery view to define the content available for visualization and analytics.

What’s Next

Create views to enable users to visualize the indexed data for your object group.

Did this page help you?