Integrate BigQuery Data in Pentaho Data Integration



Build ETL pipelines based on BigQuery data in the Pentaho Data Integration tool.

The CData JDBC Driver for Google BigQuery enables access to live data from data pipelines. Pentaho Data Integration is an Extraction, Transformation, and Loading (ETL) engine that data, cleanses the data, and stores data using a uniform format that is accessible.This article shows how to connect to BigQuery data as a JDBC data source and build jobs and transformations based on BigQuery data in Pentaho Data Integration.

About BigQuery Data Integration

CData simplifies access and integration of live Google BigQuery data. Our customers leverage CData connectivity to:

  • Simplify access to BigQuery with broad out-of-the-box support for authentication schemes, including OAuth, OAuth JWT, and GCP Instance.
  • Enhance data workflows with Bi-directional data access between BigQuery and other applications.
  • Perform key BigQuery actions like starting, retrieving, and canceling jobs; deleting tables; or insert job loads through SQL stored procedures.

Most CData customers are using Google BigQuery as their data warehouse and so use CData solutions to migrate business data from separate sources into BigQuery for comprehensive analytics. Other customers use our connectivity to analyze and report on their Google BigQuery data, with many customers using both solutions.

For more details on how CData enhances your Google BigQuery experience, check out our blog post: https://www.cdata.com/blog/what-is-bigquery


Getting Started


Configure to BigQuery Connectivity

Google uses the OAuth authentication standard. To access Google APIs on behalf of individual users, you can use the embedded credentials or you can register your own OAuth app.

OAuth also enables you to use a service account to connect on behalf of users in a Google Apps domain. To authenticate with a service account, register an application to obtain the OAuth JWT values.

In addition to the OAuth values, specify the DatasetId and ProjectId. See the "Getting Started" chapter of the help documentation for a guide to using OAuth.

Built-in Connection String Designer

For assistance in constructing the JDBC URL, use the connection string designer built into the BigQuery JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

java -jar cdata.jdbc.googlebigquery.jar

Fill in the connection properties and copy the connection string to the clipboard.

When you configure the JDBC URL, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

Below is a typical JDBC URL:

jdbc:googlebigquery:DataSetId=MyDataSetId;ProjectId=MyProjectId;InitiateOAuth=GETANDREFRESH

Save your connection string for use in Pentaho Data Integration.

Connect to BigQuery from Pentaho DI

Open Pentaho Data Integration and select "Database Connection" to configure a connection to the CData JDBC Driver for Google BigQuery

  1. Click "General"
  2. Set Connection name (e.g. BigQuery Connection)
  3. Set Connection type to "Generic database"
  4. Set Access to "Native (JDBC)"
  5. Set Custom connection URL to your BigQuery connection string (e.g.
    jdbc:googlebigquery:DataSetId=MyDataSetId;ProjectId=MyProjectId;InitiateOAuth=GETANDREFRESH
  6. Set Custom driver class name to "cdata.jdbc.googlebigquery.GoogleBigQueryDriver"
  7. Test the connection and click "OK" to save.

Create a Data Pipeline for BigQuery

Once the connection to BigQuery is configured using the CData JDBC Driver, you are ready to create a new transformation or job.

  1. Click "File" >> "New" >> "Transformation/job"
  2. Drag a "Table input" object into the workflow panel and select your BigQuery connection.
  3. Click "Get SQL select statement" and use the Database Explorer to view the available tables and views.
  4. Select a table and optionally preview the data for verification.

At this point, you can continue your transformation or jb by selecting a suitable destination and adding any transformations to modify, filter, or otherwise alter the data during replication.

Free Trial & More Information

Download a free, 30-day trial of the CData JDBC Driver for Google BigQuery and start working with your live BigQuery data in Pentaho Data Integration today.

Ready to get started?

Download a free trial of the Google BigQuery Driver to get started:

 Download Now

Learn more:

Google BigQuery Icon Google BigQuery JDBC Driver

Rapidly create and deploy powerful Java applications that integrate with Google BigQuery data including Tables and Datasets.