Step 2. Define Object Groups

Create object groups to organize and associate similar objects (files) in your cloud storage buckets for ChaosSearch indexing.

Cloud storage, and data lakes, usually contain a massive variety of objects and files. Object groups are a virtual filter for your cloud-storage buckets that identify specific objects that you want to index using ChaosSearch. Each object group filters on specific files, and you can define rules for how to index that particular content. ChaosSearch uses those object group rules to index your data for querying and analysis.

πŸ‘

Tips and Good Practices‐Keeping Analysis Simple

As a good practice, start slowly; create a static object group to filter and index some of your object files. Review the object group content information to examine the structure of the index, and to see if the field names for the data in the source files are intuitive.

With ChaosSearch, is it easy to create rules that include or exclude fields, change field names, set data types of fields, and define rules to control the content of the indexed data. These features help to reduce or eliminate the churn and re-pipelining to tune the original source content when fields are not correct, intuitive, or useful for your data analysis requirements.

Evaluating and cleaning the indexed data works most smoothly when you can start with small representative samples to develop plans and rules for the types of content at your site. You can tune the indexing controls with a small sample of files before you create the plan for a wider set of storage files (and a larger resulting index that takes more time to create).

Object group planning can also identify Refinery view needs and possible materialization (that is, post-processing transformations of the indexed data) to that desired set of columns and filters for analysis and visualization.

Plan some cleanup tasks to remove any early object group tests cases (and indexes) that are no longer useful. Too many object groups, especially any unused ones, could be confusing for the analytics users who create Refinery views, and less efficient for the site.

Select the bucket where your log and event files of interest are located, and then click Create Object Group to begin. The Object Group Preview window appears

See Creating Object Groups for the detailed instructions to create and manage object groups. The Object Group Preview window lets you use the available options for file name prefixes, regular expression strings, and/or advanced filter controls to pinpoint the files that you want to include in the object group for indexing.

After you have identified the files for the group, click Next to display a Content Preview window.

Content Preview Window

The Content Preview window summarizes the discovered format of the included files (such as log, JSON, CSV, or unknown), the compression types (such as None, GZIP, or Snappy), and displays a preview sample of the content of the selected files. ChaosSearch can provide a content preview even if the files are compressed. This allows you to stay in the window while constructing regular expressions to parse the fields for indexing.

Depending on the format of the files you selected, the window might display other options such as delimiter values and a column heading field for CSV files, or array flattening options for JSON files. For Log files, there is a Formatted Preview area that shows a more user-friendly display of the field components of the log file.

Click Field Overrides for controls to override the data type for one or more fields, or to specify a field selection file. You can create and input a JSON file of field selection and processing policies to more tightly control the fields that will be indexed (or omitted) with inclusion/exclusion rules, JSON file processing options, and similar controls.

Object Group Indexing Controls

After you specify the field content and controls, click Create Object Group. The final step for an object group is to name it, and to specify some indexing options.

For each object group, you can choose whether you want ChaosSearch to run an on-demand index once (called a static index), or you can select Live Indexing to automatically index new matching files after they are written to your cloud storage locations. Live Indexing requires you to enter details for a storage event notification service using AWS SQS or Google Pub/Sub Project ID (based on the configured storage for the account) to send events when new files are written to storage.

For each object group, you can specify a retention policy to control how long to keep indexed data before it ages out and is removed. The default retention is 14 days, but you can deselect Retention Policy to keep indexed data indefinitely (no age out), or keep it selected and specify an alternative number of days or months to keep indexed data for analysis tasks.

Create the Group and Start Indexing

After you create the group, the new group is added to the configuration, and the Storage > Properties page appears. Review the information for your new object group, and if everything looks correct, click Start Indexing to run the indexing services. Indexing performs a deep analysis of the files specified by the object group, and includes any instructions for field overrides and filters in the resulting indexed data.

After you start indexing an object group, the Properties tab updates to show more information about the index, and a pie chart summary of the data types for the discovered fields. When indexing is complete, the Start Indexing button changes to Restart Indexing.

Review the Properties tab for a closer look at the fields within the indexed data. The Indexed Structure list shows each field in the indexed data, its name, and data type for a field.

The Events tab lists any indexing warnings or issues to address. If any problems stopped or blocked indexing, this tab can provide more information about the problems for you or ChaosSearch Customer Success engineers to troubleshoot the indexing issues.

The Intervals tab lists the name create date of the daily Intervals, and the size in bytes of the cloud storage object files indexed for the group. By default, ChaosSearch creates one or more daily intervals with the name:

_<*object-group-name*>_<*storage-date*>_

πŸ“˜

About the Daily Interval Name

The date value is in yyyy-mm-dd format and is the day component when the matching object files were written/saved to cloud storage. So, for example, a daily interval named _my-app-grp_2022_10_01_ relates to the indexed data for the cloud storage files indexed for my-app-grp object group, and that had a cloud storage modification date of October 1, 2022. If there are matching files with different storage modification dates, ChaosSearch creates a daily interval for each file modification date such as _my-app-grp_2022_10_02_ and so on.

The Isolation tab lists any optional isolation keys configured for the object group. Isolation keys separate indexed data by a defined key that is derived from the cloud storage object pathnames. The key could be related to tenants/organizations, applications, regions, or similar relationships that are part of the pathname. The keys can be used in views to filter the indexed data to only the data that matches the specified key(s). All the data for the other keys is essentially invisible and will not appear in the results or visualizations for the view. Isolation keys can be helpful for multi-tenancy support to separate data for one team or group from others, or for performance reasons to focus a view to the indexed data segments that relate to the specific isolation keys used in the view.

The Objects tab lists the files within the customer storage bucket that are indexed by the object group. You can review this list to confirm that all the files that you expect are indexed by the object group.

πŸ“˜

Objects listing could take some time.

When an object group has a very sparse regex that matches very few S3 objects, the Objects tab listing could take a long time to find and display matching objects included in the group. In some cases, the Objects listing UI could display No Matches.

After you create and index an object group, create a Refinery view to define the content available for visualization and analytics.


What’s Next

Create Refinery views to enable users to visualize and query the indexed data for one or more object groups.