CHAOSSEARCH is a SaaS solution that transforms S3 storage into an analytics platform for historical log and event data. The CHAOSSEARCH service uniquely automates the discovery, organization, and indexing of log and event data and provides both an Elasticsearch and Kibana interface for analysis. With the CHAOSSEARCH service, you can extend the functionality of an Elasticsearch cluster onto S3 for easy, inexpensive access to the long-tail of your data.
The CHAOSSEARCH service is designed to work alongside an existing Elastic Stack architecture. When an Elasticsearch cluster becomes cost prohibitive to store log and event data for longer-term analysis, simply move these logs and events into your S3 bucket and let the CHAOSSEARCH service manage and index this data while providing Elasticsearch and Kibana access. CHAOSSEARCH conceptually turns S3 into an Elasticsearch cluster for historical (aka Warm) log and event analytics, allowing an actual Elasticsearch cluster to focus on real-time analysis.
Elasticsearch and the ELK stack are used for real-time search and fault detection of log and event data, typically in the range of 5 to 10 days. CHAOSSEARCH is designed to provide “long-term” log and event management and analysis in the range of weeks, months, and years, all at a significantly reduced cost. Storing gigabytes and terabytes of log and event data in Elasticsearch becomes expensive quickly, forcing users to either archive data to S3 or delete it completely. Additionally, unlike Elasticsearch, the CHAOSSEARCH service is the first of its kind to offer cost-effective log and event management and analytics, enabling trend, predictive, and machine learning analysis.
CHAOSSEARCH is an elastic cloud service running on AWS requiring no provisioning or configuration in order to scale -- a true SaaS offering. The CHAOSSEARCH service transforms S3 storage into a log and event analytics platform. The service was built from the ground up using leading technologies such as Scala/Akka distributed framework, Docker Swarm cluster deployment, and our own Data Edge indexing technology. Each aspect of our solution is designed for performance and scale.
The primary focus of CHAOSSEARCH is to provide simple and cost-efficient historical log and event data analysis. However, the service goes well beyond being a data analytics platform with Elasticsearch and Kibana interfaces. CHAOSSEARCH also provides data management capabilities such as discovery, cataloging, organization, grouping, normalization, and indexing. With CHAOSSEARCH, users can be confident their S3 data lake does not become a data swamp. For more information about use cases, visit https://chaossearch.io and click on Use Cases.
The CHAOSSEARCH service has been optimized for historical data analytics. We have identified several use cases all geared around the analysis of long-term time-based logs and events:
- Live Long-Term Log & Event Storage
- Historical Log & Event Analytics
- Log & Event Data Management
- Searchable Data Retention for Compliance
- Machine Learning
For more information about use cases, visit https://chaossearch.io and click on Use Cases.
It’s easy to get started! First, request a trial to the CHAOSSEARCH service https://info.chaossearch.io/request-trial. Then make sure you have:
- An existing AWS account with S3 bucket privileges
- AWS account access with read / write IAM privileges
CHAOSSEARCH will provide you with a customer ID for IAM configuration. See Prerequisites for more information about AWS S3 configuration.
Pricing for CHAOSSEARCH isn’t official yet but will be based on a data plan similar to S3 with annual tiered options ranging from 5TB to 250TB. Entry-level plans will be priced around $0.075/GB/year. Official pricing for the CHAOSSEARCH service will be available in late Q2.
With one click, CHAOSSEARCH discovers and catalogs just about any type of data found in S3 including CSV, JSON, and LOG. CHAOSSEARCH indexing functionality has the ability to automatically model CSV, JSON, and LOGs where the service understands many of the most common logging formats.
There are no imposed limits to the amount of data you can store or use with CHAOSSEARCH; Amazon S3 storage is the primary and only backing store used within the service.
CHAOSSEARCH specifically chose Amazon S3 as its first go-to-market storage layer. The reasons are many, including cost, scale, and simplicity. However, a major benefit is that S3 has become the de facto standard for storing log and event data either as an archive or as a temporary store before moving data into an Elasticsearch cluster; and it’s often the case that cloud services already store data in S3. As a result, data might not have to be moved out of an Elasticsearch cluster since it already resides within S3. In the case that data is only within Elasticsearch, there are several easy techniques and tools to export data to S3 as either JSON or CSV file format. In a future release, the CHAOSSEARCH service will discover and index archived Elasticsearch indices backed up within S3.
Aside from the CHAOSSEARCH API itself, CHAOSSEARCH allows you to access your data through two main interfaces: Amazon S3 REST API and Elasticsearch APIs. For raw data stored in Amazon S3, CHAOSSEARCH can act as a passthrough to S3 for most regular Bucket / Object operations. For logical views of your data created using the CHAOSSEARCH API, the service allows read-only access to your data via the following interfaces:
Amazon S3 REST API
- GET Service (ListAllMyBuckets)
- GET Bucket (List Objects) Version 2
- GET Object
- Multi Search
- Field Capabilities
In addition to the S3 interface support, the CHAOSSEARCH service has extended this API to include relational operations in an S3 type of style. See question “What kind of relational queries can I do?” for more information.
The CHAOSSEARCH service can be accessed via our secure endpoint https://service.chaossearch.io using one of our supported REST APIs (CHAOSSEARCH, Amazon S3, and Elasticsearch). All incoming requests must be signed by your CHAOSSEARCH API access key (key ID and secret) using the Amazon Signature Version 4 signing process.
It is possible to use any standard HTTP(s) client to access CHAOSSEARCH. However, we generally recommend that you use a client that supports Amazon V4 request signing to help generate the request signatures automatically. Below is a sample configuration profile for the AWS CLI to connect to the CHAOSSEARCH service (replace X's and Y's with actual CHAOSSEARCH access key ID / secret, respectively):
[chaos_search] aws_access_key_id=XXXXXXXXXXXXXXXXXXXX aws_secret_access_key=YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY s3 = addressing_style = path signature_version = s3
And here is a sample GetObject request using the AWS CLI configured to use the profile above:
aws --profile chaos_sumo --endpoint https://service.chaossearch.io/V1/ s3api get-object --bucket [bucket] --key [key] [outfile]
Yes. CHAOSSEARCH is focused on extending the functionality of Elastic Stack (ELK) onto S3. The CHAOSSEARCH platform is independent of Elastic Stack and will work with any of the Elasticsearch-based ELK as a service companies, such as Logz.io.
As part of Elasticsearch support, the CHAOSSEARCH service initially supports the following text search functionality. Text search is supported both from an Elasticsearch REST based API as well as Kibana and Lucene syntax. The list will continue to grow as the service is built out:
- Exact (i.e. term) match
- Wildcard (i.e. phrase) match
- +/- operators (i.e. must, must_not) match
CHAOSSEARCH indexes your data and exposes it in a tabular format similar to that of other relational systems. This tabular data can be queried via two interfaces: the CHAOSSEARCH extensions to S3 GetObject / ListObjectsV2 operations and the Elasticsearch API. In both cases, CHAOSSEARCH supports:
- Point and range queries for numeric, date, and string data types (=, <, >)
- Common aggregations (all datatypes): COUNT, MIN, MAX
- Numeric aggregations: SUM, AVG, STD
- Logical operators (AND, OR, NOT)
- Order By
- Group by
Any column exposed by CHAOSSEARCH indexing may be referenced by the query predicates and/or aggregations.
The CHAOSSEARCH service has been designed for large scale, historical log and event analytics. Based on CHAOSSEARCH’s Data Edge technology, it has been shown that:
- Text-based queries are up to 10x faster to index and up to 2x faster to search when compared to Lucene.
- Analytic queries are up to 5x faster to index and up to 2x faster to query when compared to column stores.
The CHAOSSEARCH service has an elastic data fabric that scales up or down based on performance and cost metrics.
All data is 100% owned by the customer. CHAOSSEARCH is a data fabric and abstraction layer on top of S3. When configuring AWS for CHAOSSEARCH, simply create an AWS IAM Role that gives the service “read-only” access. As part of this Role, specify the location that CHAOSSEARCH can write its analytic metadata. You always own your data and any related information about your data.
CHAOSSEARCH is built on top of AWS, making your data highly available, scalable, durable, and secure. Amazon S3 provides an infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility. CHAOSSEARCH uses S3 as its backing store, providing you with the same security and reliability AWS ensures its customers. And because CHAOSSEARCH never moves your data, there is no change in its security.
CHAOSSEARCH is a cloud analytics service running on AWS — built for multi-tenancy and with a host of other critical differentiators — and is superior to single-tenant cloud hosted services or on-premises deployment running old-school enterprise software.
The CHAOSSEARCH service is deployed as a collection of Docker containers to a pool of shared compute resources. These containers are customer-specific and are isolated from one another using encrypted Docker Swarm overlay networks. All external API access to the service is done via authenticated HTTPS. Requests are signed client-side and routed by the service to the correct containers via a unique customer identifier (and are rejected if the signature cannot be verified against the customer's secret key).
The CHAOSSEARCH service is an elastic, serverless solution and supports all AWS regions. Wherever your S3 buckets have been provisioned, the CHAOSSEARCH service allocates compute EC2 resources to provide our unique discover, refine, and query functionality. There is no configuration or provisioning required. The CHAOSSEARCH service ensures that all S3 data access is within the same AWS region such that there is no additional cost for network/data access.
The CHAOSSEARCH service is backed by a new and powerful indexing technology called “Data Edge”. Data Edge is an index file format that provides both relational queries and text search in one representation. This format significantly compresses data compared to existing index technologies. Written in Scala over an Akka distributed framework, Data Edge is uniquely designed to exploit the cost efficiency of object storage such as S3, while still providing high performance and elastic scale capabilities. For example, 10TB of raw source data indexed by CHAOSSEARCH would typically result in a compressed data footprint of around 2TB. And with S3 pricing, CHAOSSEARCH enables cost disruptive historical log and event analysis.
Today CHAOSSEARCH is only available on AWS. However, CHAOSSEARCH is architected as cloud agnostic and will soon be available on Google Cloud Platform and Azure. CHAOSSEARCH is also integrated with Minio.io. Minio, Inc is the prime developer of Minio cloud storage stack. Minio is a cloud storage server released under Apache License v2, compatible with Amazon S3.