Skip to main content

Vector Agent

Overview

Vector is a cross-platform high-performance log aggregation, transformation, and forwarding agent written in rust. Vector receives data from various sources, optionally performs data transformations, and then ships data to destinations (like SparkLogs).

Over 40 log sources are supported, including files, journald, syslog, Kubernetes, and Docker. Vector can ship logs to SparkLogs using the https sink.

Although AutoExtract will automatically extract structured field data from your raw logs and is recommended in the typical case, you can also manually parse log data within the Vector agent into structured fields using Vector transformations, including the powerful VRL language. Examples of common transformations can be found here.

How to Use

Follow these steps for each logical agent that will receive data from Vector:

1. Consider Vector deployment design and topology

In its simplest form, Vector is deployed as an agent on each machine and will send data directly to SparkLogs. SparkLogs is highly scalable and can receive data from distributed Vector agents without bottlenecks.

Vector can also be deployed to aggregate log data locally, and then ship this aggregated log data to SparkLogs. If you have complex requirements, review the Vector deployment guide, including the available roles and topologies.

2. Create agent and get config template

In the app, click the Configure sidebar button:
Configure Sidebar Button
and then click the Agents tab.

As appropriate, create a new agent, or highlight an existing agent and click View API Key. In the dialog that shows the agent configuration template, click the Vector tab and copy the configuration template.

3. Customize configuration

Copy the configuration template and customize it based on your needs. At a minimum, add additional sources in the YAML config, and modify the add_timezone transformation to take as input any new sources you added.

Example Vector configuration template

Make sure to get your configuration template from the app, as your ingestion endpoint can vary based on your provisioned region. This is an example of what it will look like:

log_schema:
# Use AutoExtract timestamp if found, otherwise use ingested time.
timestamp_key: "observedtimestamp"

sources:
# Add your desired log sources here...
source1:
type: "..."
...

transforms:
add_timezone:
type: remap
# Use the agent local timezone for timestamps without an explicit TZ
source: |-
.__agent_timezone = get_timezone_name!()
# Reference all of your sources here...
inputs:
- source1

sinks:
itlightning:
type: http
inputs:
- add_timezone
uri: "https://ingest-<REGION>.engine.sparklogs.app/ingest/v1"
request:
headers:
# Customize headers as desired, e.g., set to "true" to disable AutoExtract
X-No-AutoExtract: "false"
method: post
compression: gzip
encoding:
codec: json
auth:
strategy: basic
user: "<AGENT-ID>"
password: "<AGENT-ACCESS-TOKEN>"
batch:
# Ingestion is optimized for 1 MB batches
max_bytes: 1048576

4. Deploy Vector agents

On each system that will ship data to SparkLogs for this agent, install the Vector agent with the appropriate configuration, and make sure it starts on system boot.

While it's recommended to install vector in daemon mode on the same server that is producing your log data, it also supports being deployed as a sidecar.

If you're using Kubernetes, consider deploying using Helm.

Performance

Vector is lightweight yet optimized for high throughput, low latency, and CPU efficiency.

Vector's adaptive request concurrency architecture works well with SparkLogs, delivering very high throughput even from a single Vector agent (e.g., 2 TB/day or more) and dynamically adapting to your network conditions.

tip

SparkLogs is optimized to receive log data in batches roughly 1MiB in size, and to accept as many parallel requests as needed to ingest data quickly. Note that log batches larger than 25MiB will fail. Our Vector configuration template is fully optimized has been tested to support high throughput.

Advanced Use Cases

Multiline aggregation

You may wish to have multiple lines of log text joined together into one log event. For example, if your application logs a long stack trace after an error message. While this isn't required, you may find it easier to explore and analyze your log data with multi-line log messages properly merged into a single log event.

vector.dev has a number of options to do this. The simplest is to use the multiline configuration options of your vector.dev source (e.g., file source). The options support the strategy where all lines that do not begin with a certain pattern are assumed to be part of the prior event, as well as the opposite strategy where all lines are assumed to be independent unless they begin with a pattern that is known to be a line continuation.

The multiline configuration options should be sufficient for all except the most unusual cases. A more advanced technique is to use Lua scripting in a vector transformation to merge lines together based on dynamic/custom conditions. See the guide.

Ingesting into child organizations

As a very advanced use case, an agent can send data to any child organization of the organization that the agent is associated with on a per-request basis during ingestion. This allows a single agent to be setup at a higher level of the hierarchy and then it can send data to child organizations to group the data as needed.

For example, you could configure a different vector.dev http sink for each child organization you want the agent to ingest into, and then use VRL to filter the source events so that events are only picked up by the SparkLogs http sink associated with the child organization relevant to that log event.

To do this, the ingestion API request (e.g., in the vector http sink) should specify the org_id query parameter with the ID of the organization the event data should be ingested into. The authorization check will ensure that the agent is authorized to write data to the specified organization.