Skip to main content

Private Cloud Setup

When you create an account, you choose whether you want to use the SparkLogs cloud or have a private cloud deployment that uses your own Google Cloud account and resources. Each has its advantages.

A private cloud deployment is serverless and requires zero ongoing maintenance and care. Data ingestion, extraction, and query acceleration are still handled in the SparkLogs cloud.

In a private cloud setup, you authorize the SparkLogs cloud to store and process data in your own Google Cloud account and project. No data is persisted outside of your private GCP project.

This gives you complete control and access to your ingested data, and allows you to meet any unique requirements of your organization (e.g., mandated use of customer-managed encryption keys).

There are two types of private cloud service plans:

  1. Private Cloud ($0.22/GB-ingested, $440/month minimum): You store the data in your own Google Cloud account in your own BigQuery dataset; you do not pay for query costs.
  2. Private Cloud Self-Hosted Querying ($0.10/GB-ingested, $2000/month minimum): You store the data in your own Google Cloud account in your own BigQuery dataset and you also pay Google for query costs.

Setup Overview

  1. Have a Google Cloud account setup with billing activated and with rights to grant IAM roles
  2. Prepare the appropriate Google Cloud project
  3. Grant our service accounts least-privilege access to your desired GCP project
  4. Use the app to create an SparkLogs workspace, which will setup everything else

If you plan to retain more than 100 TB of ingested log data, see this additional requirement.

danger

When using Self-Hosted Querying, it is mandatory to use a BigQuery Enterprise reservation that auto-scales from zero slots. A BigQuery long-term commitment is NOT required, and your auto-scaling baseline slots should be 0 and your max slots should be 500. In combination with the unique massive-scale adaptive querying technology in SparkLogs, this ensures that generally the cost of each query is a few pennies.

On-demand query mode in BigQuery is not cost-effective, and SparkLogs will not provision the workspace if you do not have an auto-scaling reservation assigned to your host project.

If you are not using Self-Hosted Querying, then a BigQuery auto-scaling reservation is not required. In this case, to avoid unintentional large costs on querying your own BigQuery data, make sure to set a quota for query usage per day. If you only plan on querying your data from within SparkLogs, we recommend setting the quota to 0 or 1 TB to ensure you do not accidentally incur additional costs by manually query your data directly. This quota will not affect SparkLogs's ability to query your data, as the query costs that are incurred for queries performed within the product are not charged to your Google Cloud account.

If you do plan on querying your own data directly or if you are using Self-Hosted Querying, then make sure to assign a BigQuery Enterprise reservation that auto-scales from 0 slots to 500 slots. This is not necessary if you will only use SparkLogs to query your data.

Detailed instructions

Prerequisites

You will need an active Google Cloud account that has an active billing account.

tip

Make sure that the GCP project that you intend to use for SparkLogs resources has billing enabled. Google BigQuery requires that a project has billing enabled. Workspace provisioning will fail if your Google project does not have billing enabled.

If your GCP account has an organization, and if your organization has enabled Domain Restricted Sharing (on by default for orgs created after May 3, 2024), then you will need to add our organization to your policy:

Add our organization ID to the allowed list in your organization policy

Follow the instructions to set the organization domain restricted sharing policy to add our organization ID C00pfa94t to allowed organizations.

danger

Do not remove your own organization from the list of allowed organizations. If you remove your own organization ID from the policy, you can lock yourself out of your Google Cloud organization.

In summary, use the Google Cloud console for Organization Policies to edit the constraints/iam.allowedPolicyMemberDomains constraint and add our organization ID C00pfa94t to the list of allowed organizations.

Prepare the appropriate Google Cloud project

As you desire, you can use an existing Google Cloud project or create a new one just for SparkLogs. If you use an existing project, the instructions below only grant SparkLogs access to its own data, SparkLogs will not have access to any other data in your project.

Make sure that BigQuery is activated for your chosen GCP project by visiting the BigQuery console.

Grant our service accounts access

Use the IAM permissions editor for your project to grant the roles/bigquery.user (BigQuery User) role to the SparkLogs service accounts:

run-ingestor@itl-p-app-core-base-i9gm.iam.gserviceaccount.com
run-query-api@itl-p-app-core-base-i9gm.iam.gserviceaccount.com

This role allows SparkLogs to create a new BigQuery dataset within your project and have full control over it. It does not allow our app access to data in other datasets. See Google's role list for details.

Use the app to create a private cloud workspace

Permissions are now setup properly, and you can use SparkLogs to create a new workspace linked to your GCP resources:

  1. Open the app and go to the create workspace page (this is where you automatically land as a new user).
  2. Choose the Private Cloud or Private Cloud (Self-Hosted Querying) button, and fill in the Google Project ID and Dataset Name fields.
  3. Click Test Connection to Google Cloud to verify permissions and correct any error conditions reported.
  4. Proceed with the rest of the create workspace process as normal.

Cost considerations

You will pay us a fee of $0.22/GB-ingested (Private Cloud) or $0.10/GB-ingested (Self-Hosted Querying), which covers use of our platform for unlimited users. This includes our cloud receiving and processing ingested log data (including AutoExtract technology) and storing it to your private Google Cloud project.

You will also pay directly to Google Cloud the cost of BigQuery resources used by your SparkLogs account. For example, here is Google Cloud BigQuery pricing for the US region:

  • BigQuery Storage Write API: $0.025/GB-ingested
  • BigQuery Storage: SparkLogs will configure BigQuery compression and activate billing for compressed (physical) storage: $0.04/GB/mo or $0.02/GB/mo (data ingested more than 90 days ago). Log data is highly compressible, and usually compresses at 10:1 or better, so storage costs are usually minimal.
  • BigQuery Compute (only for Self-Hosted Querying): With the Private Cloud plan, you will NOT pay for any query costs, as the cost of the queries are covered by SparkLogs. With the Self-Hosted Querying plan, SparkLogs's query acceleration and optimization technologies are included for unlimited queries, and then you will also pay Google for BigQuery compute costs ($0.06/slot-hour). Combining SparkLogs's massive-scale adaptive querying technology with BigQuery auto-scaling from zero slots that instantly scales up/down in one-minute increments means that most queries cost just pennies, regardless of how much data is queried.
tip

If you are NOT using the Self-Hosted Querying plan, then your use of SparkLogs will NOT incur BigQuery Compute costs for query processing. In this case, SparkLogs is paying for all query costs for queries initiated from within SparkLogs. If you query your own data directly from within Google Cloud using BigQuery SQL, then you will pay Google Cloud for the normal compute costs associated with those queries. In this case we strongly recommend setting up a BigQuery Reservation Enterprise Edition with auto-scaling (you can scale from 0) for optimal cost savings. We also strongly recommend setting up a Google Cloud quota for BigQuery query usage per day.

Your Google Cloud usage will scale from zero: your Google cloud costs will be directly proportional to the amount of data ingested and retained. You can take advantage of Google Cloud budget alerts to monitor your Google Cloud costs.

If you prefer simple all-inclusive pricing, you can choose to create a workspace that uses the SparkLogs Cloud instead of a private cloud deployment. Your cost will then be a flat $0.39/GB-ingested (subject to a minimum of $100/month).

Deployments larger than 100 TB

tip

In a private cloud configuration, if you exceed 100 TB of retained log data, in order to use index-accelerated queries, you will need to subscribe to a Google BigQuery Enterprise or Enterprise Plus reservation and use slots in that reservation for index maintenance (see instructions). You can still use an auto-scaling reservation for this purpose to minimize costs.

This is only required for indexed queries if you exceed 100 TB of retained log data. If you exceed this limit and have not setup a slot reservation for index maintenance, you can still perform queries, and these queries will still benefit from the query-acceleration technology in SparkLogs; however, these queries may take longer or become an adaptive-scale query more often than before.

Advanced configurations

Creating the BigQuery dataset manually

Normally, you grant the SparkLogs service accounts just enough access to create a new BigQuery dataset within a GCP project, and then these service accounts will have full access on this newly created BigQuery dataset. With this technique, our service accounts will not be able to access any other datasets even in the same GCP project.

In some cases you may want to customize certain parameters of the dataset that is created (e.g., to use CMEK).

Instructions

  1. Create the BigQuery dataset in the desired GCP project. We recommend using the PHYSICAL storage billing model.
  2. In the BigQuery console, open the new dataset, and click the Sharing menu in the top-right, then Permissions. Grant the roles/bigquery.dataEditor role to our service accounts:
    run-ingestor@itl-p-app-core-base-i9gm.iam.gserviceaccount.com
    run-query-api@itl-p-app-core-base-i9gm.iam.gserviceaccount.com
  3. At the project level, use IAM to grant these same service accounts the following roles: roles/bigquery.dataViewer and roles/bigquery.jobUser. Note that the roles/bigquery.dataViewer role when granted at the project level only grants permissions to view metadata within the project, it does not grant access to actually access data within other datasets. See Google's role list for details.
  4. Proceed with the rest of the workspace creation process as normal.
note

The region of a newly created dataset must match the region that you are selecting for the SparkLogs workspace. This constraint will be enforced when the workspace and your custom BigQuery dataset are linked.

Customer-managed encryption keys

Google Customer-Managed Encryption Keys (CMEK) allow you to use your own encryption keys to encrypt your data, rather than Google-managed encryption keys. This allows enterprises to meet unique requirements for data governance, compliance, and privacy.

Customer-managed encryption keys may be stored in software, in hardware (HSM cluster), or even externally outside of Google. See Google security key management for details.

If you wish to use CMEK with the data stored by SparkLogs in your GCP project, you will need to manually create the BigQuery dataset yourself, making sure to specify your CMEK configuration. Then grant roles to the SparkLogs service accounts to this BigQuery dataset like you normally would.

Once the dataset is created, make sure to set a dataset default key so that any tables created by SparkLogs are using CMEK and your desired encryption key. After your SparkLogs workspace is fully provisioned, you can check if the created tables are using CMEK.

You will pay Google additional usage fees related to Customer-Managed Encryption Keys. SparkLogs does not charge extra for using CMEK.