Standard Field Mapping
Standard fields
In addition to supporting infinite custom fields, SparkLogs also defines the following standard fields:
timestamp: The timestamp of the log message itself.ingested_timestamp: The timestamp when the log data was actually ingested into the system.event_index: The index of the ingested event within the batch of events submitted in one ingestion request. This allows to reconstruct the exact order of log events even if their timestamps are exactly the same.severity: The severity level of the log message.facility: The facility level of the log message, if any (usually only set for syslog data).subsource: Log filename, pod stream (e.g., stdout), Windows Event Log channel name, instrumentation scope name for OTLP logs.source: The name of the source of the event (e.g., Kubernetes podname or hostname of the device that generated the log event).service: Logical service / workload identity (for example OTelservice.name, ECS nestedservice.name, or Kubernetes workload names detected from resource attributes).app: A string field often set by log forwarding agents for broader application grouping (for example ECS-stylelabels.application,service.namespace, or syslogappname).message: The original and unmodified string value of the log event.category: One or more category labels (separated by.), as extracted by category extraction.patternandpattern_hash: The pattern (and corresponding hash) assigned by AutoClassify.trace_id: A globally unique ID that tracks a single request across distributed systems.span_id: An ID unique within a given trace that tracks a single operation within that trace.
The following reserved fields are also automatically populated based on the agent and organization that ingest the data:
organization_id: The ID of the organization that owns the data.agent_id: The ID of the agent that ingested the data.
Automatic detection of standard fields
Various logging systems and log forwarding agents have widely different names for these standard fields,
and for some sources, the mapping may be different for different types of events (e.g., for Google Cloud Platform
events, the source field may be set by pod_name for K8s and by instance_id for VMs).
SparkLogs automatically detects and maps fields from many common log schemas, agents, platforms, and logger libraries.
Wire-format schemas:
- syslog (RFC 5424 / RFC 3164)
- OpenTelemetry semantic conventions
- Elastic Common Schema (ECS)
- Splunk HEC event format
- Heroku Logplex format
Agents and collectors:
- Vector
- Fluent Bit
- Fluentd (including
fluent-plugin-kubernetes_metadata_filter) - Filebeat / Elastic Beats
- Grafana Alloy (Loki label conventions)
- OpenTelemetry Collector (including the
windowseventlog,journald, andk8sattributescomponents) - Datadog Agent reserved attributes
- Splunk HEC clients
Platforms and data sources:
- Windows Event Log on Vector, OpenTelemetry Collector, and Winlogbeat
- Linux journald on Vector and OpenTelemetry Collector
- Docker container logs via Vector
docker_logs - Kubernetes on Vector, Fluent Bit, the Fluentd metadata plugin, OpenTelemetry Collector, Grafana Alloy, Filebeat, and Google Cloud Logging (
k8s_container) - AWS CloudTrail
- AWS Lambda via OpenTelemetry FaaS resource attributes
- AWS ECS / Fargate via OpenTelemetry
aws.ecs.*attributes - AWS EKS (Kubernetes shape, with
aws.eks.cluster.arnas the cluster identifier) - Google Cloud Logging with monitored resources including
gce_instance,k8s_container,cloud_run_revision,cloud_function, andgae_app - Azure Monitor for Azure Functions and Azure App Service
- Heroku Logplex
- Fly.io platform fields
Logger libraries:
- zap (Go)
- log4j / log4j2 (Java)
Standard-field mapping summary by source
The tables below list the vendor-emitted fields that populate each pivot for common sources. When a
field is read from your ingestion configuration (not the log payload itself), the cell calls that out.
Mapping is also performed for all other relevant fields (timestamp, message, etc.).
Wire-format schemas
| Source | app | service | source | subsource |
|---|---|---|---|---|
| OpenTelemetry | k8s.cluster.name; else service.namespace | service.name; else workload names; else faas.name | k8s.namespace.name/k8s.pod.name; else k8s.pod.name; else faas.instance; else host.name | scope.name; else log.file.path; else stream |
| Elastic Common Schema (ECS) | orchestrator.cluster.name; else service.namespace | service.name | service.node.name; else kubernetes.pod.name; else host.name | log.file.path; else log.logger |
| Syslog (RFC 5424) | (empty) | appname | hostname | msgid; else procid |
Agents and collectors
| Source | app | service | source | subsource |
|---|---|---|---|---|
Vector kubernetes_logs | kubernetes.pod_labels."app.kubernetes.io/part-of" | kubernetes.pod_labels."app.kubernetes.io/name"; else kubernetes.pod_owner | kubernetes.pod_namespace/kubernetes.pod_name | stream; else file |
Vector windows_event_log | (empty) | provider_name | computer | channel |
Vector journald | (empty) | _systemd_unit; else syslog_identifier | _hostname; else host | syslog_identifier; else _comm |
Vector syslog | (empty) | appname | hostname; else host | msgid; else procid |
Vector docker_logs | label.com.docker.compose.project | (container_name, fallback only) | host | stream; else container_id |
| Fluent Bit (kubernetes filter) | kubernetes.labels.app.kubernetes.io/part-of | kubernetes.labels.app.kubernetes.io/name | kubernetes.namespace_name/kubernetes.pod_name; else kubernetes.host | stream; else kubernetes.docker_id |
Fluentd (in_tail + kubernetes metadata filter) | kubernetes.labels.app.kubernetes.io/part-of | kubernetes.labels.app.kubernetes.io/name | kubernetes.namespace_name/kubernetes.pod_name; else kubernetes.host | tailed_path; else tag |
| Filebeat / Elastic Beats | orchestrator.cluster.name; else service.namespace | service.name | kubernetes.pod.name; else host.name | log.file.path; else log.logger |
| Grafana Alloy / Loki labels | app | service_name; else job | namespace/pod; else pod; else instance | filename; else stream |
| Datadog Agent | (encoded in ddtags) | service / dd.service | host | ddsource |
| Splunk HEC | (from index via ingestion config) | sourcetype | host | the HEC source field, which is typically a file path |
Platforms and data sources
| Source | app | service | source | subsource |
|---|---|---|---|---|
| Windows Event Log (root-field shape) | (empty) | provider_name (when emitted as a root field) | Computer | Channel |
| macOS unified logging (post-remap) | reverse-DNS prefix of subsystem | subsystem | host | os.category |
| AWS CloudTrail | recipientAccountId | eventSource | composite of recipientAccountId and awsRegion | resources[0].ARN; else eventName |
| AWS Lambda | cloud.account.id | faas.name | faas.instance | scope.name |
| AWS ECS / Fargate | aws.ecs.cluster.name; else aws.ecs.cluster.arn | aws.ecs.task.family | aws.ecs.task.arn | aws.ecs.container.name (fallback only) |
| AWS EKS | aws.eks.cluster.arn (plus standard Kubernetes app candidates) | (Kubernetes shape) | (Kubernetes shape) | (Kubernetes shape) |
Google Cloud Logging — GKE (k8s_container) | resource.labels.cluster_name | (resource.labels.container_name, fallback only) | composite of resource.labels.namespace_name and resource.labels.pod_name | (empty) |
Google Cloud Logging — Cloud Run (cloud_run_revision) | resource.labels.project_id | resource.labels.service_name | resource.labels.instanceId | (empty) |
Google Cloud Logging — Cloud Functions (cloud_function) | resource.labels.project_id | resource.labels.function_name | resource.labels.execution_id | (empty) |
Google Cloud Logging — App Engine (gae_app) | resource.labels.project_id | resource.labels.module_id | gae_instance.instance_id | (empty) |
Google Cloud Logging — Compute Engine (gce_instance) | resource.labels.project_id | (empty) | instance_id; else vm_name | (empty) |
| Azure Monitor (Functions / App Service) | (Resource Group, from ingestion config) | WEBSITE_SITE_NAME | WEBSITE_INSTANCE_ID | Category (e.g. AppServiceConsoleLogs, FunctionAppLogs) |
| Heroku Logplex | (Heroku app name, set by your drain configuration) | dyno type (web) | dyno (e.g. web.1) | the Heroku source field (app, heroku-router, etc.) |
| Fly.io | fly.app.name | process group | fly.machine_id; else fly.alloc_id | process group; else stream |
If no input field maps to a given pivot, that pivot is left empty rather than auto-filled from
another pivot. This is expected for some SaaS-only data where no machine-level source exists.
Empty values still work for filtering, grouping, and scoping queries, so the
subsource / source / service / app / organization_id hierarchy
for context exploration narrows naturally based on which pivots are populated.
Original vendor fields: copied vs moved
When SparkLogs detects a standard field from a vendor-specific field name, the value is either copied or moved into the standard field, depending on which standard field it populates:
- Pivot fields (
source,service,app,subsource) and trace identifiers (trace_id,span_id,category) are copied. The original vendor field stays as a custom field so you can query on either the SparkLogs standard name or the original vendor name. For example, if your payload haspod, the value populates thesourcepivot AND thepodfield remains queryable. - Normalized fields (
timestamp,severity,facility,message,pattern) are moved. The original vendor field is replaced by the normalized SparkLogs value (e.g. text"info"→ numeric OTel severity), so a duplicate raw value would diverge from the canonical form.
One exception: if your payload uses the exact SparkLogs canonical name at root (e.g. you sent
{service: "checkout"} directly), it's moved even though service is a pivot — keeping it would
just duplicate the value against the standard pivot.
Mapping notes
A few specific mappings are worth highlighting:
-
Syslog
appnamepopulatesservice. RFC 5424 APP-NAME is the program name (sshd,nginx,postgres) and is treated as a logical service identity. -
AWS CloudTrail
eventSourcepopulatesservice. Values likes3.amazonaws.comname the AWS service that was called, which matches the SaaSserviceconvention. -
deployment.environment(prod/staging/dev) is not mapped toapp. Environment is a separate axis. Any unmapped fields are stored as custom fields and you can filter on it as needed. You may want to map data from different environments to different organizations. -
container.nameandcontainer.idare fallback-only. Container name fillsserviceonly when no canonical service identity (service.name, deployment name,faas.name) is present. Container id fillssubsourceonly when no canonical subsource (scope name, log file, stream, channel) is present. -
Specific workload identifiers beat host identifiers beat generic
host/hostname. When a payload contains multiple candidates forsource, the most-specific identifier wins deterministically:- Workload-instance name (
k8s.pod.name,aws.ecs.task.arn,faas.instance,dyno,bucket_name) - Workload-instance UUID / hash (
service.instance.id,fly.machine_id,kubernetes.pod.uid, Cloud RuninstanceId) - Underlying host / node name (
host.name,kubernetes.host,computer,_HOSTNAME,vm_name) - Underlying host UUID / hash (
host.id,_machine_id,instance_id,ec2_instance_id) - Generic root-level
hostorhostname(used only as a last resort, because different shippers fill these with different things: the Vector collector's host vs. the application's container hostname vs. the original log producer's host)
Example: an OpenTelemetry SDK on EKS typically emits both
k8s.pod.nameandhost.name. The pod name winssource(more specific) andhost.namestays as a custom field. An ECS payload with bothcom.amazonaws.ecs.task-arnandhostresolves source to the task ARN. - Workload-instance name (
Customizing severity mappings
For the field that is detected to be the severity field, it will interpret this field value from any
supported text (case insensitive) or numeric value. Numeric severity values must be in the range 1-24
and are interpreted as defined by the OpenTelemetry standard for severity values.
Standard textual severities include trace, debug, info, notice or display (info3), warn,
error or fail, critical (error4), fatal, alert (fatal2), panic (fatal3), and emergency (fatal4).
If you have non-standard severity values, you can either transform these values before shipping the logs (e.g., using vector VRL transformations), or more conveniently by using the custom severity mapping feature.
To specify a custom mapping, use the X-Severity-Map HTTP header.
This is a comma-delimited list of key=value pairs that specifies additional mappings from a custom severity
level to a standard severity level.
For example, in the Node.js bunyan log library
it uses numeric log levels from 10 (trace) to 60 (fatal). You could remap these with a X-Severity-Map
HTTP header value of 10=TRACE,20=DEBUG,30=INFO,40=WARN,50=ERROR,60=FATAL.
Timestamp constraints
When log data is ingested, if the timestamp is older than your configured retention period (or if
you are using the SparkLogs cloud and the event is older than 50 days), then the timestamp will
be set to ingested_timestamp and the original timestamp will be stored into a original_timestamp
custom field.
Also, if there is no timestamp in the structured log data provided by the logging agent, then
AutoExtract will attempt to determine the event timestamp from the log message. If nothing
relevant is detected, it will use the current date/time for the ingested log event. If you prefer
that AutoExtract uses the timestamp in the message first and then falls back to the timestamp
sent by your log forwarding agent, then configure your log forwarding agent to use observedtimestamp
for the timestamp field that it sends.