Field Type Detection
Overview
A key design principle of SparkLogs is to avoid configuration whenever possible. Rather than asking you to manually setup custom fields (specifying name and data type), the existence of data that contains a given field value is what defines that custom field.
Additionally, when data is ingested, each field value is analyzed, and a type is assigned.
Possible field types are:
text
: Stores strings of arbitrary lengthnumeric
: Floating point or integer numbersboolean
: Yes/No or True/False valuestimestamp
: Date/time with a timezone offsetseverity
: A severity level from 1-24 as defined in the OpenTelemetry specificationfacility
: A facility level from 0-24 as defined in the syslog specification
It's possible to have data across different log events for the same field name
that will result in different types. The system will handle the situation gracefully, allowing
you to query data any of these fields. Even though they have the same name, the type name
is also part of the field name through a special type suffix (e.g., #s
or #n[]
).
In LQL queries you can refer to fields with or without the special type suffix.
Numeric Fields
Numeric fields are stored as 64-bit floating point numbers. If you want to treat a numeric field
as a 64-bit integer field in your query expressions, you can append the special #i
type suffix,
which will force a conversion to a 64-bit integer type during the query process.
Boolean Fields
Boolean fields are either true or false. AutoExtract will convert a field with a value of
either true
or false
(case insensitive) to a boolean field.
Timestamp Fields
Date/time information is formatted in various ways. Our goal is to automatically detect and parse as many valid date/time formats as possible.
Extracted information will detect and treat valid date/time values that have spaces as one value,
even if the date/time value is not quoted. This ensures that log text like finished=October 20, 2023 6:06pm
will produce a timestamp value finished: 2023-10-20T18:06:00
, rather than just finished: October
.
If a timezone is not specified in the value itself, the vector.dev logging agent will specify a default timezone to use (defaults to the timezone of the machine running the vector.dev agent).
Over 100 timestamp formats are supported, including many variants of the following common formats:
2022-06-13T07:59:49Z RFC3339
2022-06-13T07:59:49.123456 EST RFC3339 with microseconds
02/Jan/2006:15:04:05 -0700 Apache
Sun Dec 04 04:47:44 2005 Apache
17/06/09 20:10:40 Spark
2013-Feb-03 Date only
Thu, 4 Jan 2018 17:53:36 +0000 Weekday with date/time and TZ offset
September 17, 2012 10:09pm Long date with am/pm
1332151919 UNIX timestamp in seconds
1384216367111222 UNIX timestamp in microseconds
2014年04月08日 Chinese
Refer to the extended examples of the excellent dateparse
golang package for a full list.
Severity
Only the standard field severity
can have the severity
field type. In addition to the severity names
in the OpenTelemetry specification, many other common severity names are valid and mapped to appropriate
severity levels internally. For example, all syslog
severity level names are accepted.
Common abbreviations (e.g., err
or warn
) are also accepted.
LQL query expressions also recognize valid severity values when filtering against the severity
field.
The following are example LQL query expressions showing this in action:
- LQL
severity in (trace, info)
severity >= warn
severity between crit and emergency