Schema Overrides - Field Data Types
Schema data type overrides refine how source fields are indexed and made available for analytics.
ChaosSearch uses auto-detection to identify the fields and their data types within the files that you select for indexing. Auto-detection checks each file as it is indexed, and assigns a data type—such as string, number, timeval, or period—for each field in the indexed data.
You can use the Schema Overrides controls to override the auto-detected data type for one or more fields. As one example, a numeric customer ID value is likely auto-detected as a number, and you could override it to a string type to ensure that it is treated like a string and is not used in numeric counts or aggregations.
Timeval and Period Data Types
A period data type is a string representation of date/time (e.g., 2018-02-13 20:03:53.589918) and is not converted into a number representation. Timeval is the number version of date/time based on milliseconds from Epoch (that is, January 1st, 1970 at 00:00:00 UTC). This is also referred to as UNIX system time.
In most cases, it is best to use a Timeval data type because Timeval is more performant for most operations in both time and space. The Period data type is helpful for the less common case where there is an advantage to "keep" a column as a string data type to know and preserve the format of the raw source.
As ChaosSearch indexes new files for an object group, the overrides will be used even if the content of that field is auto-detected as a different type. In the timestamp field case, a field that has a configured override to timeval would keep that timeval data type, even if the field content is not detectable as a a valid timestamp value.
Use Caution with Data Type Overrides
Avoid coercing the data type of object group fields unless necessary or as directed by Customer Success or Engineering. Data type overrides affect the storage and nature of the indexed data. You cannot change an object group to alter or remove an override. It is highly recommended to carefully plan overrides after a careful review of some indexing of a few sample source files and the fields that are used for analysis.
Setting a Data Type Override
During the object group creation process, after you have selected a storage bucket and defined the expressions for the cloud storage objects to include, you can use the Schema Overrides button to define a data type override if needed. Make sure that you know the field name and the desired data type before you proceed.
To create a field override:
- In the object group content preview window, click Schema Overrides in the top right corner.
- In the Schema Overrides dialog, click Add Data Type Override to add an editable row with a Field and Data Type value.
- For Field, type the field name that you want to override.
- In the Data Type field, select the data type override from the drop-down list. The default is String.
- Optionally, if you want to define another override, click Add Override to add another row and specify the field and data types.
- The Schema Policy area is for importing a JSON file with rules that define how to process the fields within the files ingested by the obect group. See Schema Policies for more information about creating the JSON file and its capabilities.
- Click Save to save the overrides.
When the object group is created, the data types that you assigned appear in the Properties section.
Data Types Could Change in New Daily Intervals
If you do not specify a field override, the data type for a field could change if new files for that object group contain data that is a different type versus previous files. Sometimes the source application that creates log and event data has a change in schema. For example, a field that previously held a string could change to a number/enumerated value. ChaosSearch will update the daily intervals for the change in field data types, but this could have an impact to your views and any aggregations or visualizations that use those fields.
When you create a Refinery view for an object group, you select a timeval field (if one is available in the data) to be the source for time-based displays such as Discover histograms. If that timeval field changes to a different data type such as a string, any Refinery view that uses that field for its timestamp will return errors in Discover histograms for time periods when the field is classified as a string.
Updated 7 months ago