Ready to get started?

Learn more or sign up for a free trial:

CData Sync

Replicate Multiple Google Data Catalog Accounts



Replicate multiple Google Data Catalog accounts to one or many databases.

CData Sync for Google Data Catalog is a stand-alone application that provides solutions for a variety of replication scenarios such as replicating sandbox and production instances into your database. Both Sync for Windows and Sync for Java include a command-line interface (CLI) that makes it easy to manage multiple Google Data Catalog connections. In this article we show how to use the CLI to replicate multiple Google Data Catalog accounts.

Configure Google Data Catalog Connections

You can save connection and email notification settings in an XML configuration file. To replicate multiple Google Data Catalog accounts, use multiple configuration files. Below is an example configuration to replicate Google Data Catalog to SQLite:

Windows

<?xml version="1.0" encoding="UTF-8" ?> <CDataSync> <DatabaseType>SQLite</DatabaseType> <DatabaseProvider>System.Data.SQLite</DatabaseProvider> <ConnectionString>ProjectId=YourProjectId;</ConnectionString> <ReplicateAll>False</ReplicateAll> <NotificationUserName></NotificationUserName> <DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString> <TaskSchedulerStartTime>09:51</TaskSchedulerStartTime> <TaskSchedulerInterval>Never</TaskSchedulerInterval> </CDataSync>

Java

<?xml version="1.0" encoding="UTF-8" ?> <CDataSync> <DatabaseType>SQLite</DatabaseType> <DatabaseProvider>org.sqlite.JDBC</DatabaseProvider> <ConnectionString>ProjectId=YourProjectId;</ConnectionString> <ReplicateAll>False</ReplicateAll> <NotificationUserName></NotificationUserName> <DatabaseConnectionString>Data Source=C:\my.db</DatabaseConnectionString> </CDataSync>

Google Data Catalog uses the OAuth authentication standard. Authorize access to Google APIs on behalf on individual users or on behalf of users in a domain.

Before connecting, specify the following to identify the organization and project you would like to connect to:

  • OrganizationId: The ID associated with the Google Cloud Platform organization resource you would like to connect to. Find this by navigating to the cloud console.

    Click the project selection drop-down, and select your organization from the list. Then, click More -> Settings. The organization ID is displayed on this page.

  • ProjectId: The ID associated with the Google Cloud Platform project resource you would like to connect to.

    Find this by navigating to the cloud console dashboard and selecting your project from the Select from drop-down. The project ID will be present in the Project info card.

When you connect, the OAuth endpoint opens in your default browser. Log in and grant permissions to the application to completes the OAuth process. For more information, refer to the OAuth section in the Help documentation.

Configure Queries for Each Google Data Catalog Instance

Sync enables you to control replication with standard SQL. The REPLICATE statement is a high-level command that caches and maintains a table in your database. You can define any SELECT query supported by the Google Data Catalog API. The statement below caches and incrementally updates a table of Google Data Catalog data:

REPLICATE Schemas;

You can specify a file containing the replication queries you want to use to update a particular database. Separate replication statements with semicolons. The following options are useful if you are replicating multiple Google Data Catalog accounts into the same database:

You can use a different table prefix in the REPLICATE SELECT statement:

REPLICATE PROD_Schemas SELECT * FROM Schemas

Alternatively, you can use a different schema:

REPLICATE PROD.Schemas SELECT * FROM Schemas

Run Sync

After you have configured the connection strings and replication queries, you can run Sync with the following command-line options:

Windows

GoogleDataCatalogSync.exe -g MyProductionGoogleDataCatalogConfig.xml -f MyProductionGoogleDataCatalogSync.sql

Java

java -Xbootclasspath/p:c:\sqlitejdbc.jar -jar GoogleDataCatalogSync.jar -g MyProductionGoogleDataCatalogConfig.xml -f MyProductionGoogleDataCatalogSync.sql