Object groups are a virtual filter; they use file name prefixes and similar rules to associate data files that have a common content format so that they can be indexed using similar rules.
Most sites will typically have a few object groups to categorize different types of cloud-storage files for indexing. As a best practice, work with your ChaosSearch Customer Success team to plan a few object groups to organize your data sources.
To create an object group:
- Log in to the ChaosSearch console and click Storage.
- In the left list, select the bucket that contains the files that you want to index.
- Click Create Object Group in the top right corner.
The Object Group Preview window appears.
In the left list, if you had selected a bucket before you clicked Create Object Group, the window displays folders and files in your selected bucket. If you have not yet selected a bucket, select a bucket to continue. The Object Group Preview pane updates to show the files and folders in that bucket.
Select the files to include in your object group for indexing. You can specify the files to include by using the following methods:
- Prefix – Type a prefix string to filter the object pathnames to those that begin with the prefix string. See AWS prefix guidance for more information about specifying prefixes.
- Regex Filter – Type a regular expression that matches on object file names to include in the group. See Java documentation for more information about specifying regular expressions; make sure that the regular expression is sufficient to match on the entire object file pathname.
Objects listing could take some time.
When an object group has a very sparse regex that matches very few S3 objects, the Objects listing could take a long time to find and display matching objects included in the group. In some cases, the Objects listing UI could display
Regular Expression Tips
When you are constructing a regex, it is best to construct the most specific regex values possible to find your cloud storage files of interest for the object group. If possible, avoid using wide (greedy) wildcards in the beginning or middle folder pathnames because a wildcard such as
.*early in the path could result in unnecessary traversals of folders where no matches will be found. For example, consider the following bucket pathname structure:
If you want to index only the log files that are generated by
app1, it can be better to define a regex like
test/acct\-[1|2]/app1/2023/.*to focus the traversal on
A regex such as
test/.*/app1/2023/01/.*with a wildcard in the second folder position could cause ChaosSearch to spend time traversing the
test/acct-2/app2paths even though there will be no matches found below them.
Wider folder wildcards could be helpful in sites where new accounts (
test/acct-3/app1...) are expected to be added over time, and their files are intended to be processed by the same object group. Some discussion with your ChaosSearch Customer Success team can help to craft the best regex for the expected changes of the bucket structure and object group setup.
A sample window follows that shows a sample regex filter for ELB log files.
Regular Expression Editing
Selecting a file in the preview list populates the regular expression for that file name. This can be a helpful way to pre-populate a regex to edit for refinement. Click the pencil icon to open a regex editor to modify the expression as needed and to see the effect on the pathname matches. Click the X icon to clear the prefix and regex settings if needed.
Optionally, click Advanced Filtering to display a window of additional options such as filters by file modification date, file size, cloud storage class, isolation keys (which can separate index data based on customer/multi-tenant naming or other methods), or custom object tags/metadata values.
After you select the files and specify any optional advanced filtering controls for the object group, click Next. The Content Preview window appears. A sample window follows for log files; see About the Content Preview for other examples and more information.
The content preview summarizes the format of the selected file(s) (such as LOG, JSON, CSV, or Unknown) and the compression types (such as NONE, GZIP, SNAPPY, or SNAPPY-JAVA). CSV, JSON, and LOG files display options to help with their index processing.
Object groups and ingest services now support for snappy-java framing (
[magic header:16 bytes]([block size:int32][compressed data:byte array])*) for snappy-compressed files. See https://github.com/xerial/snappy-java#compatibility-notes for more information about framing formats in Snappy.
Optionally, click Field Overrides to override the data type auto-detection of one or more fields within the source files. ChaosSearch includes auto-detection routines that scan the matching storage files and auto-detect data types for numbers, strings, time values, and periods. Administrators can refine or lock in the data type for one or more fields using Field Overrides. This override enables virtual data transformations of the source content. Additional controls on the schema transformation page support the ability to include or exclude a list of specific fields for the index.
Click Create Object Group to complete the object group definition. The Create Object Group window appears.
Type a name for the object group.
Select whether you want to use static indexing or Live Indexing options for the group.
- Static indexing is the default. When you start indexing, Chaos Index runs one indexing pass to find and index the matching object storage files for the group.
- For live indexing, select Live Indexing and in the new field, specify an AWS ARN for an SQS messaging queue (or GCP ID). When you start indexing, Chaos Index watches for SQS or Pub/Sub notifications to index any new matching cloud storage files as they are written to cloud storage and when notification events are sent to ChaosSearch. Note that any matching cloud storage files that already exist in S3/GCP will not be indexed by the object group.
In the Retention Policy field, select age-out time for the daily Intervals created for the object group. The default is to keep the daily intervals for cloud storage files that have modification dates in the last 14 days. You set a different number of days or months, or deselect the retention option to keep the daily intervals for an unlimited timeframe (no auto cleanup).
Click Create to save the object group. The Storage > Properties window appears with the new group selected for review.
- Review the definition of the group to confirm that the information and configuration is correct, then click Start Indexing. See Indexing Your Data for the next steps.
Updated 12 days ago
After you create an object group, confirm the settings and start indexing.