Syslog Data Ingestion

Overview

Syslog is a widespread standard for transmitting log messages from network devices and servers, and is both a protocol and a message format. A variety of syslog standards have emerged over its 40+ years of use. Syslog messages are sent over UDP, TCP, or TLS over TCP.

While a variety of standards exist for syslog message formatting, many vendors and devices generate log messages in a semi-compliant or custom format. This historically makes it tedious and time consuming to configure parsing rules for each vendor and device that is sending syslog data.

SparkLogs solves this by automatically parsing a wide variety of standard syslog formats as well as through AutoExtract technology that extracts structured field data out of log text with zero configuration.

Supported Syslog Formats

SparkLogs is tested to be compatible with the many variants of RFC3164, RFC5424 and proprietary log formats from Cisco, Juniper, SonicWall, WatchGuard, and Fortinet. We expect that most syslog formats will parse automatically without issue, and we welcome feedback on any incompatibilities.

Capturing Syslog Data

The syslog protocol is not directly supported by the SparkLogs cloud, so you should use an open source syslog log forwarding agent to receive data via UDP, TCP, or TLS and then forward it to SparkLogs via HTTPS.

We recommend using Vector or Fluentbit to receive and forward syslog data.

Collection Topology

Decide where you want to place the agents that will receive syslog data, whether in each LAN that has devices that transmit syslog data, or in a central location that will receive syslog data from all devices across the Internet.

If you are sending syslog data over UDP or plain TCP we recommend you place log agents within the same LAN network if possible so that syslog data is not sent over the Internet. If you do transmit syslog data over UDP on the Internet, choose a non-standard port and/or consider IP allow-listing restrictions.

If you are aggregating syslogs centrally consider hosting your log forwarding agent (e.g., Vector) in a public cloud for reliability. For low scale environments you can use an inexpensive Linux VMs such as AWS t3.nano (~$4/month), GCP f1-micro (~$4/month), or Azure B1s (~$10/month). Even fractional vCPU VMs will be able to handle thousands of syslog messages per second.

For higher-scale environments you could deploy a single VM with more resources, or consider using network load balancers to distribute syslog data across multiple log forwarding agent VMs.

Vector Syslog Configuration

We recommend using the Vector socket source to receive syslog data over UDP, TCP, or TLS. Even though Vector does have a syslog source, it will drop packets that it cannot parse, and it does not support non-standard syslog formats. It is simpler to use the socket source to receive unparsed syslog data, forward that data to SparkLogs, and let AutoExtract automatically parse the syslog data as appropriate.

If you are transmitting syslog over TCP or TLS there are different framing options, which determines how to tell where each individual syslog message ends. The most robust method is octet counting which prepends the length of each syslog message before the message itself. Another common method is the so-called non-transparent framing (as defined in RFC6587) which ends each message with a newline character. This naturally limits each syslog message to a single line. We recommend selecting the octet counting framing method if supported by your syslog source.

You may need to setup multiple vector sources on different ports and protocols if you have syslog sources that support different transmission protocols (UDP, TCP, or TLS) and framing methods (octet counting or non-transparent).

Here is an example Vector configuration that receives syslog data over UDP on port 9139:

sources:
  socket_udp_input:
    type: "socket"
    address: "0.0.0.0:9139"
    mode: udp

Use this vector source as part of your overall Vector config file, as documented here.

Syslog Format Reference

All the common syslog standard formats are supported, as well as many proprietary formats. See how SparkLogs will parse using AutoExtract on your own syslog messages here:

Core Syslog Format

The most basic form of a syslog message begins <PRIORITY> where PRIORITY is an integer (0..191). PRIORITY encodes a facility level (0..23) and severity level (0..7):

FACILITY = PRIORITY / 8
SEVERITY = PRIORITY % 8

For example, a PRIORITY value of <134> indicates a facility of 16 (local use 0) and severity of 6 (informational).

Syslog severity levels are mapped to the appropriate SparkLogs severity and are defined as follows:

Syslog Severity	Mapped SparkLogs Severity	Description
0	Emergency (24)	System is unusable
1	Alert (22)	Action must be taken immediately
2	Critical (20)	Critical conditions
3	Error (17)	Error conditions
4	Warning (13)	Warning conditions
5	Notice (11)	Normal but significant condition
6	Informational (9)	Informational messages
7	Debug (5)	Debug-level messages

SparkLogs uses the standard syslog facility levels, which are defined as follows:

Facility Level	Description
0	Kernel messages
1	User-level messages
2	Mail system
3	System daemons
4	Security/authorization messages
5	Syslog messages
6	Line printer subsystem
7	Network news subsystem
8	UUCP subsystem
9	Clock daemon (cron)
10	Security/authorization messages (private)
11	FTP daemon
12	NTP subsystem
13	Audit logs
14	Logged alerts
15	Clock daemon
16-23	Local use 0-7

RFC3164

RFC3164 documented the many variations of syslog messages in use at the time and is only loosely followed. The format takes the general shape of one of the following:

<PRIORITY>TIMESTAMP HOSTNAME TAG: MESSAGE
<PRIORITY>TIMESTAMP HOSTNAME TAG[PID]: MESSAGE

TIMESTAMP is canonically MMM DD HH:MM:SS (e.g., Apr 9 18:37:00) but there are countless variations, some of which include the year. Many vendors do not include a : after the TAG. The RFC dictates there should be no whitespace before the TIMESTAMP but many devices do not follow this.

For this format, SparkLogs captures the PRIORITY into the standard fields severity and facility, TIMESTAMP as the standard timestamp field, HOSTNAME as the standard source field, and TAG as the standard app field. If PID is present it is captured in the proc_id field.

RFC5424

RFC5424 is a strict and structured syslog format that also allows for arbitrary key-value pairs of data to be captured. If your device supports this format it is recommended to use it. RFC5424 follows this format:

<PRIORITY>VERSION TIMESTAMP HOSTNAME APPNAME PROC_ID MSG_ID STRUCTURED-DATA MESSAGE

Key points:

VERSION should be 1. Timestamp should be RFC3339 format (e.g., 2023-04-09T18:37:00Z).
HOSTNAME is the hostname or IP of the device that generated the message or - (none).
APPNAME is the name of the application or service that generated the message or - (none).
PROC_ID is the process ID of the application or - (none).
MSG_ID is a unique identifier for the message or - (none).
STRUCTURED-DATA is either - (none) or a series of named lists of key-value pairs each enclosed in brackets:

[SD-ID foo="bar" f2="hello world" ...]

SD-ID is a unique identifier for the structured data and is either a standard name (timeQuality, origin, or meta) or is a custom ID followed by @ENTERPRISE_NUMBER where ENTERPRISE_NUMBER is an integer that uniquely identifies the vendor.

In each bracketed section, after the SD-ID you can have zero or more key-value pairs. If there are zero pairs, then it is considered as if the value named by SD-ID for that section is true. The key-value pairs are separated by spaces and the values must be quoted.

Field values are captured similar to RFC3164, and MSG_ID is additionally captured into both the msg_id field and forms the start to the AutoClassify category. Additionally, structured data is captured into custom fields, with each SD-ID forming a top-level field that is either set to true (for zero key-value pairs) or an object with each of the key-value pairs.

Example RFC5424 with structured data:

Cisco

Syslog messages generated by Cisco devices are surprisingly varied in format. Almost every device type sends syslog data in a non-standard format, and it may or may not send the <PRIORITY> syslog header. Messages generally follow the shape:

<PRIORITY>SEQNO: HOSTNAME: TIMESTAMP: %TEXTFACILITY-SEVERITY-MNEMONIC: MESSAGE

Key points:

SEQNO is a message sequence number that may not be present.
HOSTNAME is the hostname or IP of the originating device and may not be present.
TIMESTAMP is the message time, and may be preceded by a . or * if the device's clock is unsynchronized.
TEXTFACILITY is a text description of the device module that generated the message.
SEVERITY is the syslog severity level (0..7) and will be mapped to a SparkLogs severity level.
MNEMONIC is a message ID that should be unique for the given TEXTFACILITY.

Even across the many varied formats most Cisco message fields will be recognized and captured automatically, especially TIMESTAMP, TEXTFACILITY (into the net_facility field), SEVERITY (into net_severity and mapped to severity field if PRIORITY field is missing), and MNEMONIC (into msg_id field).

Check the Cisco syslog configuration manual for your device. Enable extended timestamps (to include the full year) when possible, and also prefer to send messages with the <PRIORITY> header and include the hostname. Select RFC5424 or RFC3164 format if available.

Example Cisco syslog message:

Juniper

Juniper network appliances send syslog data in RFC3164 format.

Example Juniper syslog message:

SonicWall

SonicWall network appliances can be configured to send syslog messages via UDP. We recommend selecting the default syslog format. Messages have the basic <PRIORITY> header and then follow a proprietary format with key-value pairs with mixed types. These will automatically be captured by AutoExtract.

You can optionally configure SonicWall to include a custom id field value. If this is configured, it will use it as a hostname and store it to the source standard field. If that is not available, it will then look for the sn field (serial number) and then fallback to the fw field (IP address of the firewall).

Example SonicWall syslog message:

WatchGuard

WatchGuard network appliances can be configured to send syslog data. Choose the "syslog" format so that it will send data following RFC3164.

Example WatchGuard syslog message:

Fortinet

Fortinet network appliances send a core syslog header <PRIORITY> followed by a proprietary format with key/value pairs. Also, you should configure the device to send the hostname field as that is not on by default. For example:

config log syslogd
  set status enable
  set severity information
  set policy "my-syslog-policy"
  config custom-field
    edit 1
      set name hostname
      set value $hostname
end

Example Fortinet syslog message:

Overview​

Supported Syslog Formats​

Capturing Syslog Data​

Collection Topology​

Vector Syslog Configuration​

Syslog Format Reference​

Core Syslog Format​

RFC3164​

RFC5424​

Cisco​

Juniper​

SonicWall​

WatchGuard​

Fortinet​