ChaosSearch FAQs

📘
ChaosSearch GitHub
CS-GitHub

General FAQ

What is ChaosSearch?

ChaosSearch is a SaaS solution that transforms cloud storage into an analytics platform for historical log and event data. The ChaosSearch service uniquely automates the discovery, organization, and indexing of log and event data and provides an Elasticsearch, OpenSearch Dashboards, and Superset interface for analysis. With the ChaosSearch service, you can extend the functionality of an Elasticsearch cluster onto cloud storage for easy, inexpensive access to the long-tail of your data.

How does ChaosSearch extend my existing Elastic Stack?

The ChaosSearch service is designed to work alongside an existing Elastic Stack architecture. When an Elasticsearch cluster becomes cost prohibitive to store log and event data for longer-term analysis, simply move these logs and events into your S3 bucket and let the ChaosSearch service manage and index this data while providing Elasticsearch and OpenSearch Dashboards access. ChaosSearch conceptually turns cloud storage AWS S3/GCP buckets into an Elasticsearch cluster for historical (aka warm) log and event analytics, allowing an actual Elasticsearch cluster to focus on real-time analysis.

What is the difference between Elasticsearch and ChaosSearch?

Elasticsearch and the ELK stack are used for real-time search and fault detection of log and event data, typically in the range of 5 to 10 days. ChaosSearch is designed to provide “long-term” log and event management and analysis in the range of weeks, months, and years, all at a significantly reduced cost. Storing gigabytes and terabytes of log and event data in Elasticsearch becomes expensive quickly, forcing users to either move that data to archive cloud storage with expensive retrieval costs, or delete it completely. Additionally, unlike Elasticsearch, the ChaosSearch service is the first of its kind to offer cost-effective log and event management and analytics, enabling trend, predictive, and machine learning analysis.

How does ChaosSearch work?

ChaosSearch is an elastic cloud service running on AWS requiring no provisioning or configuration in order to scale—a true SaaS offering. The ChaosSearch service transforms cloud storage into a log and event analytics platform. The service was built from the ground up using leading technologies such as Scala/Akka distributed framework, Docker Swarm cluster deployment, and our own Data Edge indexing technology. Each aspect of our solution is designed for performance and scale.

What can I do with ChaosSearch?

The primary focus of ChaosSearch is to provide simple and cost-efficient historical log and event data analysis. However, the service goes well beyond being a data analytics platform with Elasticsearch, OpenSearch Dashboards, and Superset interfaces. ChaosSearch also provides data management capabilities such as discovery, cataloging, organization, grouping, normalization, and indexing. With ChaosSearch, users can be confident their data lake does not become a data swamp. For more information about use cases, visit https://chaossearch.io and click Use Cases.

What use cases does ChaosSearch support?

The ChaosSearch service has been optimized for historical data analytics. We have identified several use cases all geared around the analysis of long-term time-based logs and events:

Live Long-Term Log & Event Storage
Historical Log & Event Analytics
Log & Event Data Management
Searchable Data Retention for Compliance
Machine Learning

For more information about use cases, visit https://chaossearch.io and click Use Cases.

How do I get started with ChaosSearch?

It’s easy to get started! First, request a free trial to the ChaosSearch service https://chaossearch.io. Then make sure you have:

An existing AWS or GCP account with S3 or GCP Cloud Storage bucket privileges
AWS/GCP account access with read/write IAM privileges

ChaosSearch will provide you with a customer ID for IAM configuration. See Prerequisites for more information about cloud access configuration.

Data Storage

What type of data does ChaosSearch support?

With one click, ChaosSearch discovers and catalogs just about any type of data found in cloud storage including CSV, JSON, and LOG. ChaosSearch indexing functionality has the ability to automatically model CSV, JSON, and LOGs where the service understands many of the most common logging formats.

How much data can I store and use?

There are no imposed limits to the amount of data you can store or use with ChaosSearch; Amazon S3/GCP Cloud Storage is the primary and only backing store used within the service.

How is data moved from my Elasticsearch cluster to my ChaosSearch data store?

ChaosSearch specifically chose Amazon S3 as its first go-to-market storage layer. The reasons are many, including cost, scale, and simplicity. However, a major benefit is that cloud storage like S3 and GCP have become the de facto standard for storing log and event data either as an archive or as a temporary store before moving data into an Elasticsearch cluster; and it’s often the case that cloud services already store data in AWS/GCP buckets. As a result, data might not have to be moved out of an Elasticsearch cluster since it already resides within your cloud storage. When data is only within Elasticsearch, there are several easy techniques and tools to export data to your cloud storage as either JSON or CSV file format. In a future release, the ChaosSearch service will discover and index archived Elasticsearch indices backed up within cloud storage.

Integrations & API

Which APIs does ChaosSearch support?

Aside from the ChaosSearch API itself, ChaosSearch allows you to access your data through two main interfaces: Amazon S3 REST API and Elasticsearch APIs. For raw data stored in Amazon S3, ChaosSearch can act as a passthrough to S3 for most regular Bucket/Object operations. For logical views of your data created using the ChaosSearch API, the service allows read-only access to your data via the following interfaces:

Amazon S3 REST API

GET Service (ListAllMyBuckets)
GET Bucket (List Objects) Version 2
GET Object

Elasticsearch APIs

Search
Multi Search
Aggregations
Field Capabilities

In addition to the S3 interface support, the ChaosSearch service has extended this API to include relational operations in an S3 type of style. See “What kind of relational queries can I do?” for more information.

How do I programmatically access the ChaosSearch service?

The ChaosSearch service can be accessed via the secure endpoint created for your site, such as https://service.chaossearch.io, using one of our supported REST APIs (ChaosSearch, Amazon S3, and Elasticsearch). All incoming requests must be signed by your ChaosSearch API access key (key ID and secret) using the Amazon Signature Version 4 signing process.

It is possible to use any standard HTTP(s) client to access ChaosSearch. However, we generally recommend that you use a client that supports Amazon V4 request signing to help generate the request signatures automatically. Below is a sample configuration profile for the AWS CLI to connect to the ChaosSearch service (replace X's and Y's with actual ChaosSearch access key ID / secret, respectively):

[chaos_search]
aws_access_key_id=XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key=YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
s3 =
  addressing_style = path
  signature_version = s3

And here is a sample GetObject request using the AWS CLI configured to use the profile above:

aws --profile chaos_sumo --endpoint https://service.chaossearch.io/V1/ s3api get-object --bucket [bucket] --key [key] [outfile]

Does ChaosSearch work with any of the ELK as a service companies like Logz.io?

Yes. ChaosSearch is focused on extending the functionality of Elastic Stack (ELK) onto cloud storage. The ChaosSearch platform is independent of Elastic Stack and will work with any of the Elasticsearch-based ELK as a service companies, such as Logz.io.

Capabilities & Performance

What kind of text searches can I do?

As part of Elasticsearch support, the ChaosSearch service initially supports the following text search functionality. Text search is supported using SQL queries and an Elasticsearch REST based API as well as OpenSearch Dashboards (DQL). The list will continue to grow as the service is built out:

Exact (i.e. term) match
Wildcard (i.e. phrase) match
+/- operators (i.e. must, must_not) match

What kind of relational queries can I do?

ChaosSearch indexes your data and exposes it in a tabular format similar to that of other relational systems. This tabular data can be queried via two interfaces: the ChaosSearch extensions to S3 GetObject / ListObjectsV2 operations and the Elasticsearch API. In both cases, Search supports:

Point and range queries for numeric, date, and string data types (=, <, >)
Common aggregations (all datatypes): COUNT, MIN, MAX
Numeric aggregations: SUM, AVG, STD
Logical operators (AND, OR, NOT)
Order By
Group by

Any column exposed by ChaosSearch indexing may be referenced by the query predicates and/or aggregations.

What type of performance should I expect for text and analytical queries?

The ChaosSearch service has been designed for large scale, historical log and event analytics. Based on ChaosSearch’s Data Edge technology, it has been shown that:

Text-based queries are up to 10x faster to index and up to 2x faster to search when compared to Lucene.
Analytic queries are up to 5x faster to index and up to 2x faster to query when compared to column stores.

The ChaosSearch service has an elastic data fabric that scales up or down based on performance and cost metrics.

Can I increase the number of Docs shown from 500?

ChaosSearch has limited the number of documents shown in an OpenSearch Dashboards/Search Analytics Discover to 500. In certain circumstances, we can remove this limit for upon approval from our executive team.

Reliability & Security

Who owns the data in ChaosSearch?

All data is 100% owned by the customer. ChaosSearch is a data fabric and abstraction layer on top of S3/Cloud Storage. When configuring cloud access for ChaosSearch, simply create an IAM Role that gives the service “read-only” access. As part of this Role, specify the location that ChaosSearch can write its analytic metadata. You always own your data and any related information about your data.

How does ChaosSearch ensure security, reliability, and availability?

ChaosSearch is built on top of AWS and GCP, making your data highly available, scalable, durable, and secure. Amazon and GCP provide an infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility. ChaosSearch uses S3/Cloud Storage as its backing store, providing you with the same security and reliability that AWS and GCP ensure their customers. And because ChaosSearch never moves your data, there is no change in its security.

Is ChaosSearch multi-tenant?

ChaosSearch is a cloud analytics service running on AWS — built for multi-tenancy and with a host of other critical differentiators — and is superior to single-tenant cloud hosted services or on-premises deployment running old-school enterprise software.

The ChaosSearch service is deployed as a collection of Docker containers to a pool of shared compute resources. These containers are customer-specific and are isolated from one another using encrypted Docker Swarm overlay networks. All external API access to the service is done via authenticated HTTPS. Requests are signed client-side and routed by the service to the correct containers via a unique customer identifier (and are rejected if the signature cannot be verified against the customer's secret key).

Is ChaosSearch SOC compliant?

ChaosSearch has passed its SOC2 Type 2 compliance. The official letter can be provided upon request with ChaosSearch reserving the right to put an MNDA in place. View the press release here.

AWS & S3

What AWS regions does ChaosSearch support?

The ChaosSearch service is an elastic, serverless solution and supports all AWS regions. Wherever your S3 buckets have been provisioned, the ChaosSearch service allocates compute EC2 resources to provide our unique discover, refine, and query functionality. There is no configuration or provisioning required. The ChaosSearch service ensures that all S3 data access is within the same AWS region such that there is no additional cost for network/data access.

What is the impact on my AWS services (e.g., S3 costs)?

The ChaosSearch service is backed by a new and powerful indexing technology called “Data Edge.” Data Edge is an index file format that provides both relational queries and text search in one representation. This format significantly compresses data compared to existing index technologies. Written in Scala over an Akka distributed framework, Data Edge is uniquely designed to exploit the cost efficiency of object storage such as S3, while still providing high performance and elastic scale capabilities. For example, 10TB of raw source data indexed by ChaosSearch would typically result in a compressed data footprint of around 2TB. And with S3 pricing, ChaosSearch enables cost disruptive historical log and event analysis.

Does ChaosSearch support other cloud providers?

Today ChaosSearch is available on AWS and Google Cloud Platform Cloud Storage. ChaosSearch is architected to be cloud-agnostic. Contact ChaosSearch Customer Success to discuss cloud provider support options.