Connecting and Pushing Data to BigQuery

JoeM · ‎08-14-2023

Introduction

As a part of Incorta's open data delivery focus, ensuring that your data is not subject to 'vendor lock-in' is a core part of the platform. This is evident with capabilities like data destinations, where you can not only ingest Incorta but also push it to other applications, 3rd party BI tools, and cloud platforms like Google BigQuery.

What you need to know before reading this article

The data destination of Google BigQuery was added in 2023.7. Any earlier versions of Incorta will not have the destination available.

Let's Go

The process of pushing to BigQuery will follow the below steps:

Configuring Incorta to write to Google BigQuery
Setting up the data destination
Pushing data to BigQuery

Configuring Incorta to write to Google BigQuery

First, we must ensure that Google BigQuery will allow Incorta to write to the platform.

In Incorta, go to the 'Data' Tab and select '+New.'
Select 'Add a data destination,' then select 'BigQuery.'
In the destination setup, there will be a service account. Copy this service account.
Go to https://console.cloud.google.com/, and ensure the correct project is selected.
Go to IAM & Admin and select 'Grant Access.' Note that you must have proper project permissions to assign roles to other users. If you do not see this service, contact your cloud admin to request the below steps.
Grant the roles of 'BigQuery Data Editor' and 'BigQuery Job User' to the Incorta service principal.

Setting up the data destination

Go to the BigQuery service and copy the project name to write to:

Spoiler
Sometimes the project name and the BigQuery project name in the explorer differ. Use the project name in the explorer.
Return to Incorta and enter the project name into the data destination.
Test and save the destination.

Pushing Data to BigQuery

It's time to assign what schemas will be pushed to Google BigQuery.

Go to the 'Schema' and open a schema to push to BigQuery.
In the schema settings (cog), select 'Set a Data Destination.'
Select the destination that was just created in the data tab.
Next, enter a name for a target schema name. This schema name will be written as a dataset in BigQuery. You may have multiple schemas writing into a single BigQuery dataset.
Option: Add a prefix that should be applied to all tables created in BigQuery.
Option: Available after 2024.1, you can select which tables in the schema should be pushed to BigQuery.
When viewing the schemas list, you'll see a BigQuery icon denoting which schemas will push to BigQuery.
Now incremental and full loads will complete incremental and full loads into BigQuery, respectively.

Performance Tips

Consider making the tables in a schema non-optimized to skip the post-load phase.
If relevant, allocate less memory to the analytics service in CMC and give the rest to the loader service since we would not need to use Incorta dashboards.

Bonus Tips

Time-to-time BigQuery will have rate limiting in place. Incorta will automatically attempt to load a table 3x before failing.
If a table fails to load, you can only load an individual table to a destination by invoking ingestion into Incorta.
You can also consider passing business schemas to BigQuery by creating a MV in schemas that queries a view.
Inspect load jobs in the BigQuery console by selecting the project history.
If a table schema has changed between loads, a full load to BigQuery will be required.