Skip to main content

Standard Field Mapping

Standard fields

In addition to supporting infinite custom fields, SparkLogs also defines the following standard fields:

  • timestamp: The timestamp of the log message itself.
  • ingested_timestamp: The timestamp when the log data was actually ingested into the system.
  • event_index: The index of the ingested event within the batch of events submitted in one ingestion request. This allows to reconstruct the exact order of log events even if their timestamps are exactly the same.
  • severity: The severity level of the log message.
  • facility: The facility level of the log message, if any (usually only set for syslog data).
  • subsource: Log filename, pod stream (e.g., stdout), Windows Event Log channel name, instrumentation scope name for OTLP logs.
  • source: The name of the source of the event (e.g., Kubernetes podname or hostname of the device that generated the log event).
  • service: Logical service / workload identity (for example OTel service.name, ECS nested service.name, or Kubernetes workload names detected from resource attributes).
  • app: A string field often set by log forwarding agents for broader application grouping (for example ECS-style labels.application, service.namespace, or syslog appname).
  • message: The original and unmodified string value of the log event.
  • category: One or more category labels (separated by .), as extracted by category extraction.
  • pattern and pattern_hash: The pattern (and corresponding hash) assigned by AutoClassify.
  • trace_id: A globally unique ID that tracks a single request across distributed systems.
  • span_id: An ID unique within a given trace that tracks a single operation within that trace.

The following reserved fields are also automatically populated based on the agent and organization that ingest the data:

  • organization_id: The ID of the organization that owns the data.
  • agent_id: The ID of the agent that ingested the data.

Automatic detection of standard fields

Various logging systems and log forwarding agents have widely different names for these standard fields, and for some sources, the mapping may be different for different types of events (e.g., for Google Cloud Platform events, the source field may be set by pod_name for K8s and by instance_id for VMs).

SparkLogs automatically detects and maps fields from many common log schemas, agents, platforms, and logger libraries.

Wire-format schemas:

Agents and collectors:

Platforms and data sources:

Logger libraries:

Standard-field mapping summary by source

The tables below list the vendor-emitted fields that populate each pivot for common sources. When a field is read from your ingestion configuration (not the log payload itself), the cell calls that out. Mapping is also performed for all other relevant fields (timestamp, message, etc.).

Wire-format schemas

Sourceappservicesourcesubsource
OpenTelemetryk8s.cluster.name; else service.namespaceservice.name; else workload names; else faas.namek8s.namespace.name/k8s.pod.name; else k8s.pod.name; else faas.instance; else host.namescope.name; else log.file.path; else stream
Elastic Common Schema (ECS)orchestrator.cluster.name; else service.namespaceservice.nameservice.node.name; else kubernetes.pod.name; else host.namelog.file.path; else log.logger
Syslog (RFC 5424)(empty)appnamehostnamemsgid; else procid

Agents and collectors

Sourceappservicesourcesubsource
Vector kubernetes_logskubernetes.pod_labels."app.kubernetes.io/part-of"kubernetes.pod_labels."app.kubernetes.io/name"; else kubernetes.pod_ownerkubernetes.pod_namespace/kubernetes.pod_namestream; else file
Vector windows_event_log(empty)provider_namecomputerchannel
Vector journald(empty)_systemd_unit; else syslog_identifier_hostname; else hostsyslog_identifier; else _comm
Vector syslog(empty)appnamehostname; else hostmsgid; else procid
Vector docker_logslabel.com.docker.compose.project(container_name, fallback only)hoststream; else container_id
Fluent Bit (kubernetes filter)kubernetes.labels.app.kubernetes.io/part-ofkubernetes.labels.app.kubernetes.io/namekubernetes.namespace_name/kubernetes.pod_name; else kubernetes.hoststream; else kubernetes.docker_id
Fluentd (in_tail + kubernetes metadata filter)kubernetes.labels.app.kubernetes.io/part-ofkubernetes.labels.app.kubernetes.io/namekubernetes.namespace_name/kubernetes.pod_name; else kubernetes.hosttailed_path; else tag
Filebeat / Elastic Beatsorchestrator.cluster.name; else service.namespaceservice.namekubernetes.pod.name; else host.namelog.file.path; else log.logger
Grafana Alloy / Loki labelsappservice_name; else jobnamespace/pod; else pod; else instancefilename; else stream
Datadog Agent(encoded in ddtags)service / dd.servicehostddsource
Splunk HEC(from index via ingestion config)sourcetypehostthe HEC source field, which is typically a file path

Platforms and data sources

Sourceappservicesourcesubsource
Windows Event Log (root-field shape)(empty)provider_name (when emitted as a root field)ComputerChannel
macOS unified logging (post-remap)reverse-DNS prefix of subsystemsubsystemhostos.category
AWS CloudTrailrecipientAccountIdeventSourcecomposite of recipientAccountId and awsRegionresources[0].ARN; else eventName
AWS Lambdacloud.account.idfaas.namefaas.instancescope.name
AWS ECS / Fargateaws.ecs.cluster.name; else aws.ecs.cluster.arnaws.ecs.task.familyaws.ecs.task.arnaws.ecs.container.name (fallback only)
AWS EKSaws.eks.cluster.arn (plus standard Kubernetes app candidates)(Kubernetes shape)(Kubernetes shape)(Kubernetes shape)
Google Cloud Logging — GKE (k8s_container)resource.labels.cluster_name(resource.labels.container_name, fallback only)composite of resource.labels.namespace_name and resource.labels.pod_name(empty)
Google Cloud Logging — Cloud Run (cloud_run_revision)resource.labels.project_idresource.labels.service_nameresource.labels.instanceId(empty)
Google Cloud Logging — Cloud Functions (cloud_function)resource.labels.project_idresource.labels.function_nameresource.labels.execution_id(empty)
Google Cloud Logging — App Engine (gae_app)resource.labels.project_idresource.labels.module_idgae_instance.instance_id(empty)
Google Cloud Logging — Compute Engine (gce_instance)resource.labels.project_id(empty)instance_id; else vm_name(empty)
Azure Monitor (Functions / App Service)(Resource Group, from ingestion config)WEBSITE_SITE_NAMEWEBSITE_INSTANCE_IDCategory (e.g. AppServiceConsoleLogs, FunctionAppLogs)
Heroku Logplex(Heroku app name, set by your drain configuration)dyno type (web)dyno (e.g. web.1)the Heroku source field (app, heroku-router, etc.)
Fly.iofly.app.nameprocess groupfly.machine_id; else fly.alloc_idprocess group; else stream
tip

If no input field maps to a given pivot, that pivot is left empty rather than auto-filled from another pivot. This is expected for some SaaS-only data where no machine-level source exists. Empty values still work for filtering, grouping, and scoping queries, so the subsource / source / service / app / organization_id hierarchy for context exploration narrows naturally based on which pivots are populated.

Original vendor fields: copied vs moved

When SparkLogs detects a standard field from a vendor-specific field name, the value is either copied or moved into the standard field, depending on which standard field it populates:

  • Pivot fields (source, service, app, subsource) and trace identifiers (trace_id, span_id, category) are copied. The original vendor field stays as a custom field so you can query on either the SparkLogs standard name or the original vendor name. For example, if your payload has pod, the value populates the source pivot AND the pod field remains queryable.
  • Normalized fields (timestamp, severity, facility, message, pattern) are moved. The original vendor field is replaced by the normalized SparkLogs value (e.g. text "info" → numeric OTel severity), so a duplicate raw value would diverge from the canonical form.

One exception: if your payload uses the exact SparkLogs canonical name at root (e.g. you sent {service: "checkout"} directly), it's moved even though service is a pivot — keeping it would just duplicate the value against the standard pivot.

Mapping notes

A few specific mappings are worth highlighting:

  • Syslog appname populates service. RFC 5424 APP-NAME is the program name (sshd, nginx, postgres) and is treated as a logical service identity.

  • AWS CloudTrail eventSource populates service. Values like s3.amazonaws.com name the AWS service that was called, which matches the SaaS service convention.

  • deployment.environment (prod/staging/dev) is not mapped to app. Environment is a separate axis. Any unmapped fields are stored as custom fields and you can filter on it as needed. You may want to map data from different environments to different organizations.

  • container.name and container.id are fallback-only. Container name fills service only when no canonical service identity (service.name, deployment name, faas.name) is present. Container id fills subsource only when no canonical subsource (scope name, log file, stream, channel) is present.

  • Specific workload identifiers beat host identifiers beat generic host / hostname. When a payload contains multiple candidates for source, the most-specific identifier wins deterministically:

    1. Workload-instance name (k8s.pod.name, aws.ecs.task.arn, faas.instance, dyno, bucket_name)
    2. Workload-instance UUID / hash (service.instance.id, fly.machine_id, kubernetes.pod.uid, Cloud Run instanceId)
    3. Underlying host / node name (host.name, kubernetes.host, computer, _HOSTNAME, vm_name)
    4. Underlying host UUID / hash (host.id, _machine_id, instance_id, ec2_instance_id)
    5. Generic root-level host or hostname (used only as a last resort, because different shippers fill these with different things: the Vector collector's host vs. the application's container hostname vs. the original log producer's host)

    Example: an OpenTelemetry SDK on EKS typically emits both k8s.pod.name and host.name. The pod name wins source (more specific) and host.name stays as a custom field. An ECS payload with both com.amazonaws.ecs.task-arn and host resolves source to the task ARN.

Customizing severity mappings

For the field that is detected to be the severity field, it will interpret this field value from any supported text (case insensitive) or numeric value. Numeric severity values must be in the range 1-24 and are interpreted as defined by the OpenTelemetry standard for severity values. Standard textual severities include trace, debug, info, notice or display (info3), warn, error or fail, critical (error4), fatal, alert (fatal2), panic (fatal3), and emergency (fatal4).

If you have non-standard severity values, you can either transform these values before shipping the logs (e.g., using vector VRL transformations), or more conveniently by using the custom severity mapping feature.

To specify a custom mapping, use the X-Severity-Map HTTP header. This is a comma-delimited list of key=value pairs that specifies additional mappings from a custom severity level to a standard severity level.

For example, in the Node.js bunyan log library it uses numeric log levels from 10 (trace) to 60 (fatal). You could remap these with a X-Severity-Map HTTP header value of 10=TRACE,20=DEBUG,30=INFO,40=WARN,50=ERROR,60=FATAL.

Timestamp constraints

tip

When log data is ingested, if the timestamp is older than your configured retention period (or if you are using the SparkLogs cloud and the event is older than 50 days), then the timestamp will be set to ingested_timestamp and the original timestamp will be stored into a original_timestamp custom field.

Also, if there is no timestamp in the structured log data provided by the logging agent, then AutoExtract will attempt to determine the event timestamp from the log message. If nothing relevant is detected, it will use the current date/time for the ingested log event. If you prefer that AutoExtract uses the timestamp in the message first and then falls back to the timestamp sent by your log forwarding agent, then configure your log forwarding agent to use observedtimestamp for the timestamp field that it sends.