Organization Reference Example 2

A sample case study for log file and object group planning

Customer Scenario

A GCP customer has logs sourced from applications on GKE. Logs are landing in the bucket in JSON format and compressed.

Good Design

The log data is stored first by major application, then GKE cluster, then by namespace within the cluster, then pod. Isolation is configured at the application, namespace, and cluster level to allow query scoping. One Pub/Sub topic is created at the top level. Once set up, the customer can add new apps and clusters into the logging bucket and not require any configuration update of GCP or ChaosSearch resources. With only a single object group, this approach leaves workers in ChaosSearch to service user queries instead of indexing.

πŸ“˜

Pathname Spacing in Documentation

In the good and poor examples below, note that extra spaces are included around the forward slash (/) characters to highlight the file organization hierarchy within the documentation. Do not add these extra spaces to pathnames in the actual bucket, or to the regular expressions that you specify within the ChaosSearch UIs when creating object group filters or isolation key expressions.

# GCP GCS "customer-logging-bucket"

# Object Group 1 - JSON use cases. Objects are compressed. 
# Pub/Sub topic is receiving notifications for the prefix "team-x-gke". 
# Isolation is configured on the second, third, and fourth directories to allow narrow query scoping with the following regex: "team-x-gke\/(\S+?)\/(\S+?)\/(\S+?)\/.*"
team-x-gke / app-1 / cluster-1 / namespace-1 / pod-1 / date / logs.gz
                                             / pod-2 / date / logs.gz
                               / namespace-2 / pod-3 / date / logs.gz
                                             / pod-4 / date / logs.gz
                   / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
           / app-2 / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                   / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
           / app-3 / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                   / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz

Poor Design

The log data is still stored first by major application, then GKE cluster, then by namespace within the cluster, then pod. One object group is created per cluster instead of using one object group with isolation keys. This results in inefficient usage of resources within ChaosSearch, significant contributions towards service limits, creates more Pub/Sub infrastructure to manage, and consumes more toward the limit of bucket notifications on the bucket. This approach also requires new object groups, pub/subs, and notifications to be set up as more clusters are onboarded.

# GCP GCS "customer-logging-bucket"

# Object Group 1 - JSON use cases. Objects are compressed. 
# Pub/Sub topic is receiving notifications for the prefix "team-x-gke/app-1/cluster-1/namespace-1". 
team-x-gke / app-1 / cluster-1 / namespace-1 / pod-1 / date / logs.gz
                                             / pod-2 / date / logs.gz
                               / namespace-2 / pod-3 / date / logs.gz
                                             / pod-4 / date / logs.gz

# Object Group 2 - JSON use cases. Objects are compressed. 
# Pub/Sub topic is receiving notifications for the prefix "team-x-gke/app-1/cluster-2/". 
                   / cluster-2 / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz

# Object Group 3 - JSON use cases. Objects are compressed. 
# Pub/Sub topic is receiving notifications for the prefix "team-x-gke/app-1/cluster-3/". 
           / app-2 / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz
                               / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz

# Object Group 7 - JSON use cases. Objects are compressed. 
# Pub/Sub topic is receiving notifications for the prefix "team-x-gke/app-1/cluster-n/". 
                   / cluster-n / namespace-n / pod-n / date / logs.gz
                                             / pod-n / date / logs.gz