Schema Transformations
Use ChaosSearch Schema Transformations to virtually transform and materialize columns for analytics and querying.
When creating a view, you can transform the schema of any of the fields within the indexed data structure. These features help analysts to apply and shape the indexed data in specific ways within the view as a form of on-demand re-piplining without the toil and impacts to the indexed data or storage.
Tailor data within the view, for use during analysis.
The ChaosSearch virtual transformations provide powerful options that allow users to create the schema that they need within the context of the view that they are creating. The source indexed data is unchanged, and thus can be used as-is, or with different transformations in different views as needed.
To create a transformation:
- While in the create view process, and in the Schema Transformation window, select or search for the column that you want to transform.
- Click the gear icon at the right end of the column row. The Schema Transformation window opens.
- In the Schema Transformation window, you can select from several predefined transformations.
- Select the desired transformation type, and supply any additional content as applicable, then click Save Transform.
Schema Transformation Options
The transformation options include the following:
Option | Description |
---|---|
Materialize with Regex | Use a custom regular expression to transform one column into one or more searchable columns with distinct content. |
Materialize with JSONPath | Use a JSONPath expression to select and transform one or more fields inside a JSON string blob as materialized columns that can be used for filtering and analytics. |
Materialize with JQ | Use a jq filter expression to select and transform one or more fields inside a JSON string blob as materialized columns that can be used for filtering and analytics. |
Treat as IP | For an IP address that is usually stored as a string value, select Treated as IP to virtually transform that value to an IP data column. |
Treat as GeoPoint | For geo-location data, treat the content of the indexed data as longitude or latitude geolocation data. |
Treat as Isolation Key | For ChaosSearch object groups that use isolation keys to split the indexed data into tenant-specific or other specific chunks, you can use this transform to specify that a column other than cs_partition_key_# will be used as filtering key to control which chunk(s) of data to include as part of the search results or analytics of this view. The column must contain values that are identical to the isolation key values in the cs_partition_key_# column. |
Treat as Nested JSON | Configure a column with JSON string content to support querying with Elastic nested path expressions in a Search Analytics Discover filter or in Elastic API search calls, or for use with Search Analytics visualizations as Metric or Bucket values. |
The Treat as transforms essentially set the data type for the column to the specified data type for the purposes of visualization and analysis. The Materialize transforms are used to create additional specific columns in the view for analysis and filtering, as described in the following sections.
Regular Expression Transformations
As an example of a Materialize with Regex transformation, if you have a log file with a field that contains a string of web URL data, you might want to transform the field into three virtual/materialized columns that each contain a specific part such as the domain, port, and path components for use in analysis or filtering.
After selecting the URL field and clicking the gear icon to transform it:
- Select Materialize with Regex as the transformation.
- Type the regular expression pattern to use for each field, such as
(\\S+[ :])(\\d+)(\\S+)
as an example. See Refinery Transformation Regex for other sample patterns. - Click Add Field to add three fields, and name them domain, port, and path. Make sure that domain and path are STRING types, while port is a NUMBER type.
- Click Refresh to update the Preview pane and review the transformation. If there are any errors or changes, you can update the regular expression and/or fields and refresh again.
- Click Save Transform when finished.
After saving your transformation changes, the Schema Transformation window shows the transformed fields as in the following example:
JSONPath Transformations
If an indexed data field contains JSON string content, you can use this transformation to specify one or more specific JSON properties as a materialized column for analytics. In many implementations, JSON string content supports only text searching within their content, but the ChaosSearch Materialize with JSONPath transformation allows you to specify a JSON path value to select a property within the JSON string, and to create a materialized column for the view. The JSON string blob remains intact for text searches and other analysis.
After selecting the JSON string field and clicking the gear icon to transform it:
- Make sure that Materialize with JSONPath is selected as the transformation.
- Click Add Field to create a placeholder for each field that you want to transform.
- Type the JSONPath expression pattern to identify each property that you want to materialize as a column. You can use third-party tools such as the online JSON path formatter to create the required JSON path value. In this example, a simple JSONPath is used to isolate the connect:version property as a new column named version.
- Click Refresh to update the Preview pane and review the transformation. If there are any errors or changes, you can update the path expression and/or fields and refresh again.
- Click Save Transform when finished. The new virtual column is shown on the Schema Transformation window.
JQ Transformations
If an indexed data field contains JSON string content, you can use the jq
transformation as an alternative to materializing with a JSON Path to specify one or more JSON properties as materialized columns for analytics. Materialize with JQ filtering works identically to the JSON Path materialization, but uses jq
filters as the means to extract the field values for the new columns. The JSON string blob remains intact for text searches and other analysis.
After selecting the JSON string field and clicking the gear icon to transform it:
- Make sure that Materialize with JQ is selected as the transformation.
- Click Add Field to create a placeholder for each field that you want to transform.
- Type the
jq
expression pattern to identify each property that you want to materialize as a column. You can use third-party tools such as the JQ play editor to create the requiredjq
path value. In this example, two simplejq
filters extract a bucketName property and a readWriteType array member property. - Click Refresh to update the Preview pane and review the transformation. If there are any errors or changes, you can update the path expression and/or fields and refresh again.
- Click Save Transform when finished. The new virtual column appears on the Schema Transformation window.
Best Practices for Virtual Field Materializations
When using the virtual field materializations (i.e., Materialize with Regex, JSONPath, or JQ), note that the virtual field materializations are applied at query time, not at ingest time, so they do not benefit from ingestion-side performance optimizations. Additionally, if a view has multiple virtual field materializations, all of them are applied in each query, potentially degrading overall performance in that view. Therefore, when considering virtual field transformations, take the following best practices into consideration:
- Limit the total number of transformations per view to less than 5.
- Avoid using wildcards in Regex transformations when possible (e.g., if parsing a lowercase letter using Regex, use
[a-z]
instead of.
). - When a field is heavily used for filtering/aggregations, try to make the change ingest-side.
- Before adding a virtual field transformation to an existing view, consider creating a new view with the necessary transformation(s) to assess any potential performance implications across all use cases that leverage the view.
Updated 4 months ago