Skip to main content

Elasticsearch Bulk API

Overview

Log data can be ingested via the Elasticsearch bulk indexing API. The OpenSearch bulk indexing API is also supported.

Many tools support exporting/streaming data to elasticsearch using the bulk indexing API. Compatibility has been specifically tested with Filebeat and Logstash. Other tools that conform to the API specification should also be compatible.

SparkLogs overcomes many traditional pains of using Elasticsearch to ingest and query log data:

  • The schemaless design with support for infinite fields means there is zero time spent configuring and managing index templates.
  • SparkLogs scales from zero to petabytes in an instant; its serverless architecture allows it to handle extremely bursty or high throughput log volumes without load issues.
  • There are no scaling limitations to worry about like shards per node, watermark thresholds, index maintenance, cluster health after reboots and cold starts; enjoy petabyte-scale logging with zero management overhead.
  • Querying is fast even with billions of matching events, and you can explore and filter large result sets with ease.

With SparkLogs, you can keep your existing log aggregation and forwarding tools and simply point them to the SparkLogs cloud instead. Then enjoy logging without limits.

Implementation Details

This API is implemented on a particular subdomain of the main ingest HTTPS endpoint on port 443: es6. or es7. or es8.. For example, https://es8.ingest-us.engine.sparklogs.app/ provides an API endpoint for the US region compatible with the Elasticsearch V8 protocol.

The API is also implemented on the main ingest HTTPS endpoint on port 443 on the /es/v<version> URI path prefix (e.g., /es/v6, /es/v7, /es/v8). We recommend using the subdomain approach as this is compatible with more tools.

Since SparkLogs is schemaless, it will respond to API requests as if any ILM policy is valid (GET /_ilm/policy/<policy>), and as if any index template exists (HEAD /_index_template/<indexname>). This ensures tools that expect these things to exist will continue as normal, even though these things are not necessary with SparkLogs.

Data is ingested through the /_bulk or /<indextarget>/_bulk REST endpoint. Refer to the bulk API documentation for details. Only the create and index bulk actions are supported.

Indexes and Document IDs

Additional fields in each ingested log event are populated based on Elasticsearch metadata:

  • _es_index - records the name of the target index of the data
  • _es_id - records the ID of the document (only present if a doc ID was explicitly specified)

How to Use

Either create an agent or View API Key for the appropriate existing agent. This will give you the full HTTPS ingestion endpoint for your region, as well as the agent ID and agent access token you need for authorization.

Configure your client program to output data via the Elasticsearch API to the proper HTTPS endpoint on port 443 and using the appropriate subdomain based on the desired version. For example: es8.ingest-<REGION>.engine.sparklogs.app:443

Configure your client program to use HTTP basic authentication using the agent ID as the username and the agent access token for the password.

For example, see instructions for configuring Beats agents or configuring Logstash.

Performance

tip

SparkLogs is optimized to receive log data in batches roughly 1MiB in size, and to accept as many parallel requests as needed to ingest data quickly. Note that log batches larger than 25MiB will fail. Our configuration templates for popular log agents are tuned for optimal throughput.