Object Storage Log Organization Best Practices

Best practices for organizing log files and storage buckets for indexing

Log files can have a variety of formats and come from different sources. It is important to organize the logs in your logging buckets to create indexing object groups that make the most efficient usage of available ChaosSearch and cloud provider infrastructure resources.

Each customer's log sources, pipelines, and infrastructure are different, so there is no one-size-fits-all approach to file organization. This topic describes the strategic objectives and some recommendations to keep in mind when designing your logging buckets and object groups. This topic includes some examples with good organizational design and some with poor design. If you have questions about the best strategy for your files, contact ChaosSearch for assistance.

Organization Objectives

In general, use the following guidelines:

ObjectiveReasoning
Limit the number of object groups to the fewest possible.Each object group requires a minimal reserve of resources to maintain indexing, so limiting the number of object groups ensures that more resources can be used for other activities like servicing user queries.
Leverage isolation keys to separate data within a single object group instead of creating multiple object groups.Isolation within an object group uses information located in the log objects' prefix path and the event records to scope queries to a subset of the entire object group. See Object Group Isolation Keys for information about how to configure isolation keys.
Design a directory hierarchy in which objects at the end of the hierarchy are as similar as possible in terms of logging formats and file formats.The logging and file formats should be the same for all files landing at the end of the hierarchy. To limit the number of object groups and notifications, give a high priority to factors like:

  • Data retention - Are there similar retention needs for items in this object group? Note that the retention settings on different isolation keys in an object group can be configured independently of each other.

  • Data format - Do the files have a similar format, like JSON, CSV, Parquet or a custom log format?

  • Compression - Are all of the files compressed or uncompressed?

Review the file schemas and group files into a single object group where there is a shared schema between different log files.An object group with thousands of columns will not be as performant when compared to an object group with hundreds of columns. Log files that share a common schema are good candidates to be placed in the same object group.

Factors Influencing Design

Consider the following requirements and limits when designing your log bucket organization:

Object group factors

  • An object group can index files located in one bucket.
  • An object group expects only one log event format (e.g., JSON, CSV, parquet, custom).
  • An object group monitors only one notification queue (Google Pub/Sub or AWS SQS).
  • An object group requires only one log file compression setting (e.g., gzip, snappy, none).
  • An object group can have a maximum of 10,000 data isolation keys.
  • An object group can have a maximum of 10,000 total columns per isolation.

Cloud object storage factors
AWS and GCP both limit the number of notification configurations to 100. The prefixes associated with those notifications cannot overlap.

ChaosSearch service factors
ChaosSearch has some additional service limits that can affect file organization planning. Refer to Service Limits to review the configuration limits and settings.