Archive Overview

Data received into the SparkLogs platform is retained for live querying for months or even years based on your configured retention period. This unified storage of current and historical data within the platform makes it easy and efficient to query across all of your data without having to manage separate data tiers and without having to rehydrate old data for querying.

SparkLogs does not charge extra for data retention. The retention period for your workspace can be verified on the Data Sovereignty section of the home dashboard.

An additional backup copy of your data is archived daily and retained for 1 year for SparkLogs Cloud plans, or for any custom period that you choose for private cloud plans. This archived data is a Data Lake stored in compressed hive-partitioned parquet format. This format is ideal as the columnar format often produces compression ratios of 10x or more, and this format can be directly queried by many Big Data platforms (e.g., AWS Athena, BigQuery, Azure Synapse Analytics, Apache Spark, and many more).

Private-cloud customers already have direct access to this archived data lake through their configured Google Cloud Storage bucket.

All customers can further replicate the archived data to one or more additional cloud storage buckets under their control. Supported replication destinations include AWS S3, any S3-compatible storage bucket, GCS, Azure Blob Storage, and more.

The replication feature enables you to make further copies of your archived data for compliance, regulatory requirements, disaster recovery, or geographic distribution. Archived objects are not modified once created, so data can be safely replicated to object storage buckets that are WORM-enabled, bucket-locked, or object-locked to meet your specific compliance needs.

Archival and replication run daily, usually within one hour after midnight UTC.

With these capabilities it's simple and cost effective to build a cold Data Lake of your observability data that can be used for long term retention, large-scale historical analysis, machine learning, or for point queries for forensic analysis and auditing.