Skip to main content

Materialized Views

Concept

In a private cloud setup, log data is stored within BigQuery tables in your own Google Cloud Platform project. This means you can directly analyze your data with the full power of BigQuery, including using Looker Studio, machine learning, Vertex AI, and ODBC/JDBC data access.

Materialized views are a powerful way to efficiently calculate and cache summarized information from your log data that is automatically updated as new data is ingested.

With SparkLogs, log data is ingested into a time-partitioned table in BigQuery. When you create a materialized view that is also time-partitioned, BigQuery only needs to recalculate ("refresh") the materialized view data for a partition that has changed. In the typical case, this is only the most recent partition (representing the most recent hour or day of data). This allows for very efficient computation and retention of key data points that you want to keep over time.

Practical example

Imagine you have an application that logs when a user unlocks an achievement in a game with a message like achievement awarded | user=someuser achievement=lucky_dodger score=97.8.

Based on AutoExtract, it will populate the x.user, x.achievement, and x.score fields, and the AutoClassify pattern will be achievement awarded |.

If you wanted to be able to easily query the most common achievements that were awarded every hour, then you could create a materialized view over your logs table as follows:

-- NOTE: replace mydataset and w12345678_abcd_1234_5678_12345678abcd with your workspace ID as appropriate
CREATE MATERIALIZED VIEW `my_dataset`.achievements_summary
PARTITION BY TIMESTAMP_TRUNC(t_hour, HOUR)
AS (
SELECT
LAX_STRING(JSON_QUERY(event_json, '$.x.achievement')) AS achievement,
TIMESTAMP_TRUNC(t, HOUR) as t_hour,
COUNT(1) AS event_count,
AVG(COALESCE(LAX_FLOAT64(JSON_QUERY(event_json, '$.x.score')), 0)) AS score_average
FROM `mydataset`.w12345678_abcd_1234_5678_12345678abcd_logs
WHERE LOWER(pattern) LIKE '%achievement awarded%'
GROUP BY 1, 2
);

Once the materialized view is created, you could efficiently query the materialized view directly, and use it for further analysis. For example, you could use BigQuery ML to perform machine learning over data extracted by AutoExtract from your logs.

This is also an easy method to hookup summarized data from your logs into a visual dashboard, like Looker Studio.

Google BigQuery materialized views support a variety of aggregate and filtering functions (details).

tip

There is a Google Cloud quota limit on the number of materialized views that can be created for each base table, so seek to create broader views to meet your needs where possible. Refer also to best practices for materialized views.