Skip to main content

Replication Setup

SparkLogs supports replicating data to one or more additional cloud storage buckets under your control. Supported replication destinations include AWS S3, any S3-compatible storage bucket, GCS, Azure Blob Storage, and more.

Setup Process

To configure replication settings in the app, click the Configure sidebar button:
Configure Sidebar Button
and then click the Archiving tab.

This screen shows a list of all configured replication targets and their status (enabled state, last replication time and result). If you have the required permissions, you can add or remove replication targets using the toolbar buttons. You can also test a connection to a given replication target to verify that the configured authorization and authentication settings are correct.

When configuring a new replication target, you give the target a name and optional description for your reference, select the target type (e.g., AWS S3, GCS, etc.), and then provide the required connection and authentication details for the selected target type.

For any replication target, you also specify:

  • An optional subdirectory (object name prefix) within the target bucket.
  • The replication time window (how many recently archived days of data to replicate). The default is to replicate the last 60 days of data. This can be configured to as few as 30 days or to as many as all archived data.

Best Practices

Create one or more object storage bucket(s) in the cloud provider(s) of your choice to receive the replicated data. When creating the bucket(s), consider the following best practices:

  • Data Retention: The archived data is stored in a highly compressed format, minimizing storage costs. Unless you want to retain data indefinitely, create a lifecycle rule for the bucket to delete objects older than your desired retention period. Make sure that the replication time window you select is shorter than the retention period you have configured for the object storage bucket that receives replicated data. (Otherwise, older data that was deleted from the bucket by your retention policy lifecycle rule will be re-replicated during the next daily replication run.)
  • Cost Reduction: Depending upon how often you plan to read the replicated data, consider using either auto-tiering storage classes, or setting up lifecycle rules to transition older objects to colder storage tiers. If you plan to frequently access/query your replicated data, use a hotter tier as appropriate. If your replicated archive bucket will only be rarely accessed, you could consider setting a default storage class to something colder so that objects are stored in colder tiers from the start.
  • Compliance: Based on your compliance requirements, you may want to enable any bucket lock, object versioning, WORM, and/or soft delete available from the provider of the storage bucket. This can help address WORM (Write Once Read Many) and other data retention and compliance requirements, such as those associated with HIPAA, FINRA, SEC, or CFTC.
  • Location and Durability: Choose the appropriate geographic location/region for the bucket based on your data residency requirements, and select a redundancy/durability option that meets your needs for data availability and resiliency. For example, you may choose to replicate to a single multi-region bucket, or you could create separate replication targets for separate buckets, each located in different regions to meet geographic distribution requirements.

Replication Target Types

AWS S3 Replication Target

Replicate data to any AWS S3 bucket by providing the target bucket name, AWS region, and authentication credentials.

Authentication is configured by specifying an AWS Access Key ID and Secret Access Key with sufficient permissions to enumerate and write objects to the target bucket.

  • Create a new IAM user specifically for authorization access to the replication bucket, and create an access key for that user.
  • Attach a policy to the IAM user to grant access to the specific bucket and grants the ability to list, read, and modify objects. For example:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "TargetBucketListContents",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME"
},
{
"Sid": "TargetBucketObjectAccess",
"Effect": "Allow",
"Action": ["s3:*Object","s3:*ObjectAcl","s3:*ObjectTagging"],
"Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*"
}
]
}

You can optionally override the storage class of replicated objects. If your AWS bucket is a next-generation directory-style bucket, be sure to set the appropriate option.

S3-Compatible Replication Target

Our S3-Compatible target allows you to replicate data any S3-compatible object storage bucket, and is known to work with the following providers:

  • Alibaba Cloud Object Storage System (OSS)
  • Arvan Cloud Object Storage (AOS)
  • Ceph Object Storage
  • China Mobile Ecloud Elastic Object Storage (EOS)
  • Cloudflare R2 Storage
  • Cubbit DS3 Object Storage
  • DigitalOcean Spaces
  • Dreamhost DreamObjects
  • Exaba Object Storage
  • FileLu S5 (S3-Compatible Object Storage)
  • Pure Storage FlashBlade Object Storage
  • Hetzner Object Storage
  • Huawei Object Storage Service
  • IBM COS S3
  • IDrive e2
  • Intercolo Object Storage
  • IONOS Cloud
  • Leviia Object Storage
  • Liara Object Storage
  • Linode Object Storage
  • Seagate Lyve Cloud
  • Magalu Object Storage
  • MEGA S4 Object Storage
  • Minio Object Storage
  • Netease Object Storage (NOS)
  • OUTSCALE Object Storage (OOS)
  • OVHcloud Object Storage
  • Petabox Object Storage
  • Qiniu Object Storage (Kodo)
  • Rabata Cloud Storage
  • RackCorp Object Storage
  • Rclone S3 Server
  • Scaleway Object Storage
  • SeaweedFS S3
  • Selectel Object Storage
  • Servercore Object Storage
  • Spectra Logic Black Pearl
  • StackPath Object Storage
  • Storj (S3 Compatible Gateway)
  • Synology C2 Object Storage
  • Tencent Cloud Object Storage (COS)
  • Wasabi Object Storage
  • Zata (S3 compatible Gateway)

Any other provider with a compatible S3 API should also work.

To configure, select the S3 Provider type from the drop-down, specify the bucket name, and either specify either the URI of the S3 API endpoint for your provider or the provider-specific region code. For more details, refer to your provider's documentation or the rclone config documentation for your relevant provider.

For example, for Cloudflare R2 object storage, the endpoint URI takes the form https://YOURACCOUNTID.r2.cloudflarestorage.com.

Usually Advanced options can be left to their default setting, but if you selected the Other provider type you may need to configure the advanced options.

Finally, configure the Access Key ID and Secret Access Key authentication credentials so that the app has sufficient permissions to enumerate and write objects to the target bucket.

Google Cloud Storage Replication Target

To replicate data to a Google Cloud Storage bucket, configure the bucket name and optionally override the default storage class. We recommend using IAM Authentication and granting the run-backend@itl-p-app-backend-n3cv.iam.gserviceaccount.com principal the Storage Object User role on the target bucket. You may need to add or organization ID to your policy to grant permissions to our service account (see below). Alternatively, you can use service account that has permissions to enumerate and write objects on the target bucket, credentials in JSON format. This method is not as secure and we recommend using IAM Authentication unless you have specific requirements.

Add our organization ID to the allowed list in your organization policy

Follow the instructions to set the organization domain restricted sharing policy to add our organization ID C00pfa94t to allowed organizations.

danger

Do not remove your own organization from the list of allowed organizations. If you remove your own organization ID from the policy, you can lock yourself out of your Google Cloud organization.

In summary, use the Google Cloud console for Organization Policies to edit the constraints/iam.allowedPolicyMemberDomains constraint and add our organization ID C00pfa94t to the list of allowed organizations.

Microsoft Azure Blob Storage Replication Target

To replicate data to an Azure Blob Storage Container, configure the target by specifying the storage container name and optionally override the default access tier for created objects.

Supported authentication types are as follows:

  • Service Principal: Specify the Storage Account Name, Tenant ID, Client ID, and Client Secret of an Azure AD Service Principal with sufficient permissions to enumerate and write blobs to the target container.
    • First create a new Entra ID application, which will give you the Tenant ID, Client ID, and Client Secret.
    • Next, assign the Entra ID application the Storage Blob Data Contributor role on the target storage container to grant it sufficient permissions.
  • Storage Account Key: Specify the Storage Account Name and one of its Access Keys.
  • Shared Access Signature (SAS) token: Specify the SAS token (URI) with sufficient permissions to enumerate and write blobs to the target container. Set expiration of the SAS URI appropriately to ensure it does not expire while you need to replicate to the target bucket.

Microsoft Azure Files Share Replication Target

To replicate data to an Azure Files Share, configure the share name and authentication. This target type supports the same authentication methods as the Azure Blob Storage target type, and also supports accepting a connection string.

Backblaze B2 Replication Target

To replicate data to a Backblaze B2 Cloud Storage, configure the bucket name and an application key. Do not use an account key or master app key. The key should grant sufficient permissions to enumerate and write objects to the target bucket.

Verifying Replication Target Settings

When creating a new replication target or making changes, you can use the Test Connection button to verify that the configured connection and authentication settings are correct. This will attempt to connect to the target bucket and verify that the configured authentication credentials have sufficient permissions to enumerate and write objects to the target bucket. If the test is successful, replication should be able to operate successfully and will run during the next daily replication cycle.

Encryption of Authentication Credentials

All authentication credentials are stored encrypted with an encryption key that is escrowed in a secure cloud key management service. Once any credentials are saved in the app, they are never viewable again, but they can be optionally updated when editing settings for a given replication target.