Step 2. Define Object Groups

Create object groups to organize and associate similar objects (files) in your cloud storage buckets for ChaosSearch indexing.

Cloud storage, and data lakes, often contain a massive variety of objects and files. Object groups are a virtual filter for your cloud-storage buckets that filter on specific objects that you want to index using ChaosSearch. Each object group can focus on specific files, and you define some rules for how to index that particular content. ChaosSearch will take it from there, and use those object group rules to index your data as you want.

👍

Object Group Planning‐Keeping Analysis Simple

As a good practice, start slowly; create an object group to filter and index some of your object files. Review the object group content information to examine the structure of the index, and to see if column names and organization are intuitive. With ChaosSearch, is it easy to include or exclude columns, change names, and also change data types to refine the information for analysis. It is best to do this tuning with a small sample of files before you apply it to a wider set of storage files (and a larger resulting index).

It is very helpful to survey the contents of your cloud storage buckets before you start creating object groups. An inventory of the storage file types, whether there are visibility or security concerns for the files (such as who needs access to the the index data for the files), and factors like update frequency for new related files can help with good object group design. Start with a set of storage objects that contain valuable data for visualization by your analysts, and define an object group to index that data. That process often results in tuning and changes, and leads to Refinery view planning and design. Plan some cleanup tasks to remove any early object groups (and indexes) that are no longer useful. Too many object groups, especially if they overlap or are stale, could be confusing for the analytics users who create Refinery views, and less efficient for the site.

Select a bucket, review its contents for file format types and the content previews, and then click Create Object Group to begin. The Object Group Preview window appears

See Creating Object Groups for the detailed instructions on how to create and manage object groups. The primary goal of the group preview window is to use the available options for file name prefixes, regular expression strings, and/or object filter controls to pinpoint the file(s) that you want to include in the object group for indexing.

After you have identified the files for the group, click Next to display a Content Preview window

Content Preview Window

The content preview summarizes the format of the included files (such as log, JSON, CSV, or unknown) and the compression types (such as none, GZIP, or snappy). ChaosSearch can provide a content preview even if the data files are compressed. This allows you to stay in the window while constructing regular expressions to parse the fields for indexing.

Depending on the format of the files you selected, there might be options for delimiter values, or for processing JSON files. Click Schema Filter for the ability to override the data type for one or more columns (to change the type to string, number, period, or time value). You can also use a JSON file to more tightly control the data within the index by including columns or excluding columns.

Object Group Indexing Controls

After you specify the column content and controls, click Create Object Group. The final step for an object group is to name it, and to specify the indexing controls.

For each object group, you can choose whether you want ChaosSearch to run an on-demand index once (a static index), or whether to use a live indexing approach, when the storage system sends ChaosSearch a notification via AWS SQS or Google Pub/Sub messaging (based on the configured account) to report the new files for indexing.

📘

Index Intervals

At this time, ChaosSearch supports only a daily interval for the index files.

For each object group, you can control how long to keep the daily index files before they age out of the system. You can configure the index lifecycle to keep the indexed files for an object group for as long as your users need them for visualization and analytics. The default is 14 days, but you could set a shorter or longer (even unlimited) duration to keep the files.

Create the Group and Start Indexing

When you create your object group, you can choose to run the indexing later (Create) or immediately when the object group is added (Create and Start). If you chose the Create option, click Start Indexing to run the indexing services. Indexing performs a deep analysis of the files specified by the object group, and includes any column overrides and filters in the resulting indexed data.

After you start indexing an object group, the Group Contents tab updates to show more information about the index, and the index structure (columns and types). The pie chart summarizes the types

After you create and index an object group, create a Refinery view to define the content available for visualization and analytics.


What’s Next

Create views to enable users to visualize the indexed data for your object group.

Did this page help you?