Syslog Data Ingestion
Overview
Syslog is a widespread standard for transmitting log messages from network devices and servers, and is both a protocol and a message format. A variety of syslog standards have emerged over its 40+ years of use. Syslog messages are sent over UDP, TCP, or TLS over TCP.
While a variety of standards exist for syslog message formatting, many vendors and devices generate log messages in a semi-compliant or custom format. This historically makes it tedious and time consuming to configure parsing rules for each vendor and device that is sending syslog data.
SparkLogs solves this by automatically parsing a wide variety of standard syslog formats as well as through AutoExtract technology that extracts structured field data out of log text with zero configuration.
Supported Syslog Formats
SparkLogs is tested to be compatible with the many variants of RFC3164, RFC5424 and proprietary log formats from Cisco, Juniper, SonicWall, WatchGuard, and Fortinet. We expect that most syslog formats will parse automatically without issue, and we welcome feedback on any incompatibilities.
Capturing Syslog Data
The syslog protocol is not directly supported by the SparkLogs cloud, so you should use an open source syslog log forwarding agent to receive data via UDP, TCP, or TLS and then forward it to SparkLogs via HTTPS.
We recommend using Vector or Fluentbit to receive and forward syslog data.
Collection Topology
Decide where you want to place the agents that will receive syslog data, whether in each LAN that has devices that transmit syslog data, or in a central location that will receive syslog data from all devices across the Internet.
If you are sending syslog data over UDP or plain TCP we recommend you place log agents within the same LAN network if possible so that syslog data is not sent over the Internet. If you do transmit syslog data over UDP on the Internet, choose a non-standard port and/or consider IP allow-listing restrictions.
If you are aggregating syslogs centrally consider hosting your log forwarding agent (e.g., Vector) in a public cloud for reliability. For low scale environments you can use an inexpensive Linux VMs such as AWS t3.nano (~$4/month), GCP f1-micro (~$4/month), or Azure B1s (~$10/month). Even fractional vCPU VMs will be able to handle thousands of syslog messages per second.
For higher-scale environments you could deploy a single VM with more resources, or consider using network load balancers to distribute syslog data across multiple log forwarding agent VMs.
Vector Syslog Configuration
We recommend using the Vector socket source to receive syslog data over UDP, TCP, or TLS. Even though Vector does have a syslog source, it will drop packets that it cannot parse, and it does not support non-standard syslog formats. It is simpler to use the socket source to receive unparsed syslog data, forward that data to SparkLogs, and let AutoExtract automatically parse the syslog data as appropriate.
If you are transmitting syslog over TCP or TLS there are different framing options, which determines how to tell where each individual syslog message ends. The most robust method is octet counting which prepends the length of each syslog message before the message itself. Another common method is the so-called non-transparent framing (as defined in RFC6587) which ends each message with a newline character. This naturally limits each syslog message to a single line. We recommend selecting the octet counting framing method if supported by your syslog source.
You may need to setup multiple vector sources on different ports and protocols if you have syslog sources that support different transmission protocols (UDP, TCP, or TLS) and framing methods (octet counting or non-transparent).
Here is an example Vector configuration that receives syslog data over UDP on port 9139:
sources:
socket_udp_input:
type: "socket"
address: "0.0.0.0:9139"
mode: udp
Use this vector source as part of your overall Vector config file, as documented here.
Syslog Format Reference
All the common syslog standard formats are supported, as well as many proprietary formats. See how SparkLogs will parse using AutoExtract on your own syslog messages here:
Core Syslog Format
The most basic form of a syslog message begins <PRIORITY>
where PRIORITY
is an integer (0..191).
PRIORITY
encodes a facility level (0..23) and severity level (0..7):
FACILITY = PRIORITY / 8
SEVERITY = PRIORITY % 8
For example, a PRIORITY value of <134>
indicates a facility of 16
(local use 0) and severity of 6
(informational).
Syslog severity levels are mapped to the appropriate SparkLogs severity and are defined as follows:
Syslog Severity | Mapped SparkLogs Severity | Description |
---|---|---|
0 | Emergency (24) | System is unusable |
1 | Alert (22) | Action must be taken immediately |
2 | Critical (20) | Critical conditions |
3 | Error (17) | Error conditions |
4 | Warning (13) | Warning conditions |
5 | Notice (11) | Normal but significant condition |
6 | Informational (9) | Informational messages |
7 | Debug (5) | Debug-level messages |
SparkLogs uses the standard syslog facility levels, which are defined as follows:
Facility Level | Description |
---|---|
0 | Kernel messages |
1 | User-level messages |
2 | Mail system |
3 | System daemons |
4 | Security/authorization messages |
5 | Syslog messages |
6 | Line printer subsystem |
7 | Network news subsystem |
8 | UUCP subsystem |
9 | Clock daemon (cron) |
10 | Security/authorization messages (private) |
11 | FTP daemon |
12 | NTP subsystem |
13 | Audit logs |
14 | Logged alerts |
15 | Clock daemon |
16-23 | Local use 0-7 |
RFC3164
RFC3164 documented the many variations of syslog messages in use at the time and is only loosely followed. The format takes the general shape of one of the following:
<PRIORITY>TIMESTAMP HOSTNAME TAG: MESSAGE
<PRIORITY>TIMESTAMP HOSTNAME TAG[PID]: MESSAGE
TIMESTAMP
is canonically MMM DD HH:MM:SS
(e.g., Apr 9 18:37:00
) but there are countless variations,
some of which include the year. Many vendors do not include a :
after the TAG
. The RFC dictates there
should be no whitespace before the TIMESTAMP but many devices do not follow this.
For this format, SparkLogs captures the PRIORITY
into the standard fields severity
and facility
,
TIMESTAMP
as the standard timestamp
field, HOSTNAME
as the standard source
field, and TAG
as the
standard app
field. If PID
is present it is captured in the proc_id
field.
RFC5424
RFC5424 is a strict and structured syslog format that also allows for arbitrary key-value pairs of data to be captured. If your device supports this format it is recommended to use it. RFC5424 follows this format:
<PRIORITY>VERSION TIMESTAMP HOSTNAME APPNAME PROC_ID MSG_ID STRUCTURED-DATA MESSAGE
Key points:
VERSION
should be1
. Timestamp should be RFC3339 format (e.g.,2023-04-09T18:37:00Z
).HOSTNAME
is the hostname or IP of the device that generated the message or-
(none).APPNAME
is the name of the application or service that generated the message or-
(none).PROC_ID
is the process ID of the application or-
(none).MSG_ID
is a unique identifier for the message or-
(none).STRUCTURED-DATA
is either-
(none) or a series of named lists of key-value pairs each enclosed in brackets:
[SD-ID foo="bar" f2="hello world" ...]
SD-ID
is a unique identifier for the structured data and is either a standard name
(timeQuality
, origin
, or meta
) or is a custom ID followed by @ENTERPRISE_NUMBER
where
ENTERPRISE_NUMBER
is an integer that uniquely identifies the vendor.
In each bracketed section, after the SD-ID
you can have zero or more key-value pairs. If there are zero pairs,
then it is considered as if the value named by SD-ID
for that section is true
.
The key-value pairs are separated by spaces and the values must be quoted.
Field values are captured similar to RFC3164, and MSG_ID
is additionally captured into both the msg_id
field
and forms the start to the AutoClassify category.
Additionally, structured data is captured into custom fields, with each SD-ID forming a top-level field that is
either set to true
(for zero key-value pairs) or an object with each of the key-value pairs.
Example RFC5424 with structured data:
Cisco
Syslog messages generated by Cisco devices are surprisingly varied in format. Almost every device type sends
syslog data in a non-standard format, and it may or may not send the <PRIORITY>
syslog header.
Messages generally follow the shape:
<PRIORITY>SEQNO: HOSTNAME: TIMESTAMP: %TEXTFACILITY-SEVERITY-MNEMONIC: MESSAGE
Key points:
SEQNO
is a message sequence number that may not be present.HOSTNAME
is the hostname or IP of the originating device and may not be present.TIMESTAMP
is the message time, and may be preceded by a.
or*
if the device's clock is unsychronized.TEXTFACILITY
is a text description of the device module that generated the message.SEVERITY
is the syslog severity level (0..7) and will be mapped to a SparkLogs severity level.MNEMONIC
is a message ID that should be unique for the givenTEXTFACILITY
.
Even across the many varied formats most Cisco message fields will be recognized and captured automatically,
especially TIMESTAMP
, TEXTFACILITY
(into the net_facility
field), SEVERITY
(into net_severity
and
mapped to severity
field if PRIORITY
field is missing), and MNEMONIC
(into msg_id
field).
Check the Cisco syslog configuration manual for your device. Enable extended timestamps (to include the full year)
when possible, and also prefer to send messages with the <PRIORITY>
header and include the hostname. Select
RFC5424 or RFC3164 format if available.
Example Cisco syslog message:
Juniper
Juniper network appliances send syslog data in RFC3164 format.
Example Juniper syslog message:
SonicWall
SonicWall network appliances can be configured to send syslog messages via UDP.
We recommend selecting the default syslog format. Messages have the basic <PRIORITY>
header
and then follow a proprietary format with key-value pairs with mixed types. These will
automatically be captured by AutoExtract.
You can optionally configure SonicWall to include a custom id
field value. If this is configured,
it will use it as a hostname and store it to the source
standard field. If that is not available,
it will then look for the sn
field (serial number) and then fallback to the fw
field (IP address
of the firewall).
Example SonicWall syslog message:
WatchGuard
WatchGuard network appliances can be configured to send syslog data. Choose the "syslog" format so that it will send data following RFC3164.
Example WatchGuard syslog message:
Fortinet
Fortinet network appliances send a core syslog header <PRIORITY>
followed by a proprietary format with key/value pairs.
Also, you should configure the device to send the hostname
field as that is not on by default. For example:
config log syslogd
set status enable
set severity information
set policy "my-syslog-policy"
config custom-field
edit 1
set name hostname
set value $hostname
end
Example Fortinet syslog message: