Skip to main content

Field Type Detection

Overview

A key design principle of SparkLogs is to avoid configuration whenever possible. Rather than asking you to manually setup custom fields (specifying name and data type), the existence of data that contains a given field value is what defines that custom field.

Additionally, when data is ingested, each field value is analyzed, and a type is assigned.

Possible field types are:

  • text: Stores strings of arbitrary length
  • numeric: Floating point or integer numbers
  • boolean: Yes/No or True/False values
  • timestamp: Date/time with a timezone offset
  • severity: A severity level from 1-24 as defined in the OpenTelemetry specification
  • facility: A facility level from 0-24 as defined in the syslog specification
Tip: Fields with the same name but different types

It's possible to have data across different log events for the same field name that will result in different types. The system will handle the situation gracefully, allowing you to query data any of these fields. Even though they have the same name, the type name is also part of the field name through a special type suffix (e.g., #s or #n[]). In LQL queries you can refer to fields with or without the special type suffix.

Numeric Fields

Numeric fields are stored as 64-bit floating point numbers. If you want to treat a numeric field as a 64-bit integer field in your query expressions, you can append the special #i type suffix, which will force a conversion to a 64-bit integer type during the query process.

Boolean Fields

Boolean fields are either true or false. AutoExtract will convert a field with a value of either true or false (case insensitive) to a boolean field.

Timestamp Fields

Date/time information is formatted in various ways. Our goal is to automatically detect and parse as many valid date/time formats as possible.

Extracted information will detect and treat valid date/time values that have spaces as one value, even if the date/time value is not quoted. This ensures that log text like finished=October 20, 2023 6:06pm will produce a timestamp value finished: 2023-10-20T18:06:00, rather than just finished: October.

If a timezone is not specified in the value itself, the vector.dev logging agent will specify a default timezone to use (defaults to the timezone of the machine running the vector.dev agent).

Over 100 timestamp formats are supported, including many variants of the following common formats:

2022-06-13T07:59:49Z              RFC3339
2022-06-13T07:59:49.123456 EST RFC3339 with microseconds
02/Jan/2006:15:04:05 -0700 Apache
Sun Dec 04 04:47:44 2005 Apache
17/06/09 20:10:40 Spark
2013-Feb-03 Date only
Thu, 4 Jan 2018 17:53:36 +0000 Weekday with date/time and TZ offset
September 17, 2012 10:09pm Long date with am/pm
1332151919 UNIX timestamp in seconds
1384216367111222 UNIX timestamp in microseconds
2014年04月08日 Chinese

Refer to the extended examples of the excellent dateparse golang package for a full list.

Severity

Only the standard field severity can have the severity field type. In addition to the severity names in the OpenTelemetry specification, many other common severity names are valid and mapped to appropriate severity levels internally. For example, all syslog severity level names are accepted. Common abbreviations (e.g., err or warn) are also accepted.

LQL query expressions also recognize valid severity values when filtering against the severity field. The following are example LQL query expressions showing this in action:

severity in (trace, info)
severity >= warn
severity between crit and emergency