Object Storage Log Organization Best Practices

Log files can have a variety of formats and come from different sources. When shipping those files to cloud storage, it is important to organize objects with common, repeatable pathname patterns to keep related types of files together. With well-organized buckets, object groups use regular expressions to find common types of files to index, and to use the available ChaosSearch and compute resources more efficiently.

Each customer's log sources, pipelines, and infrastructure are different, so there is no one-size-fits-all approach to file organization. This topic describes the some best practices and recommendations to keep in mind when planning your log shipping destinations and object groups. This topic includes some examples with good organizational design and some with poor design. If you have questions about the best strategy for your files, contact ChaosSearch Customer Success for assistance.

Organization Objectives

In general, use the following guidelines:

Objective	Reasoning
Limit the number of object groups to use the fewest possible, especially live indexing groups.	Each live indexing object group requires a minimal reserve of resources to watch for new files and index them, so limiting the number of object groups ensures that more resources can be used for other activities like servicing user queries.
Use isolation to index common files and create separate data with a single object group.	If one object group can do the work of several, especially while live indexing, use isolation and regular expressions for log objects' pathnames to separate the indexed data for categories of files into slices that are specific to each category. This allows one object group to index all similar types of log and event files in one cloud storage bucket for efficiency. The resulting data is separated, and views can be defined to filter for specific keys/categories to limit the analysis to a specific area of interest. See Object Group Isolation Keys for information about how to configure isolation keys.
Design a directory hierarchy in which objects at the end of the hierarchy are as similar as possible in terms of logging formats and file formats.	Data retention* - Will the indexed data for the files hve the same retention timeframe? (Isolation keys can be configured to have their own retention settings.) Data format* - Do the files have the same format type (JSON, CSV, Parquet)? Compression* - Do the files use the same compression (or all uncompressed)?
Review the file schemas and group files into a single object group when there is a shared common schema among different log files.	An object group with thousands of columns will not be as performant when compared to an object group with hundreds of columns. Log files that share a common schema are good candidates to be placed in the same object group.

Factors Influencing Design

Consider the following requirements and limits when designing your log bucket organization:

Object group factors

An object group can index files that are located in one bucket.
An object group can process files that have the same format (e.g., JSON, CSV, parquet, custom).
An object group can monitor only one notification queue (Google Pub/Sub or AWS SQS).
An object group can process files that have the same compression setting (e.g., gzip, snappy, none).
An object group can have a maximum of 10,000 data isolation keys.
An object group can have a maximum of 10,000 total columns per isolation.

Cloud object storage factors
AWS and GCP both limit the number of notification configurations to 100. The prefixes associated with those notifications cannot overlap.

ChaosSearch service factors
ChaosSearch has some additional service limits that can affect file organization planning. Refer to Service Limits to review the configuration limits and settings.