PolyBase で外部データソースとしてIBM Cloud Object Storage を連携利用

CData ODBC Driver for IBM Cloud Object Storage とSQL Server 2019 のPolyBase を使って、リアルタイムIBM Cloud Object Storage data に外部データソースとしてアクセス。

SQL Server のPolyBase は、データベーステーブルをクエリするTransact-SQL シンタックスを使って、外部データにクエリする仕組みです。 CData ODBC Drivers for IBM Cloud Object Storage data を組み合わせて使うことで、SQL Server データと同じようにIBM Cloud Object Storage data へのアクセスが可能です。 本記事では、PolyBase 外部データソースへのIBM Cloud Object Storage data の設定から、T-SQL クエリを使ったIBM Cloud Object Storage data へのアクセスを行います。

The CData ODBC drivers offer unmatched performance for interacting with live IBM Cloud Object Storage data using PolyBase due to optimized data processing built into the driver. When you issue complex SQL queries from SQL Server to IBM Cloud Object Storage, the driver pushes down supported SQL operations, like filters and aggregations, directly to IBM Cloud Object Storage and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. And with PolyBase, you can also join SQL Server data with IBM Cloud Object Storage data, using a single query to pull data from distributed sources.

IBM Cloud Object Storage への接続

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs. To create an external data source in SQL Server using PolyBase, configure a System DSN (CData IBM Cloud Object Storage Sys is created automatically).

Register a New Instance of Cloud Object Storage

If you do not already have Cloud Object Storage in your IBM Cloud account, follow the procedure below to install an instance of SQL Query in your account:

  1. Log in to your IBM Cloud account.
  2. Navigate to the page, choose a name for your instance and click Create. You will be redirected to the instance of Cloud Object Storage you just created.

Connecting using OAuth Authentication

There are certain connection properties you need to set before you can connect. You can obtain these as follows:

API Key

To connect with IBM Cloud Object Storage, you need an API Key. You can obtain this as follows:

  1. Log in to your IBM Cloud account.
  2. Navigate to the Platform API Keys page.
  3. On the middle-right corner click "Create an IBM Cloud API Key" to create a new API Key.
  4. In the pop-up window, specify the API Key name and click "Create". Note the API Key as you can never access it again from the dashboard.

Cloud Object Storage CRN

If you have multiple accounts, you will need to specify the CloudObjectStorageCRN explicitly. To find the appropriate value, you can:

  • Query the Services view. This will list your IBM Cloud Object Storage instances along with the CRN for each.
  • Locate the CRN directly in IBM Cloud. To do so, navigate to your IBM Cloud Dashboard. In the Resource List, Under Storage, select your Cloud Object Storage resource to get its CRN.

Connecting to Data

You can now set the following to connect to data:

  • InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
  • ApiKey: Set this to your API key which was noted during setup.
  • CloudObjectStorageCRN (Optional): Set this to the cloud object storage CRN you want to work with. While the connector attempts to retrieve this automatically, specifying this explicitly is recommended if you have more than Cloud Object Storage account.

When you connect, the connector completes the OAuth process.

  1. Extracts the access token and authenticates requests.
  2. Saves OAuth values in OAuthSettingsLocation to be persisted across connections.

Click "Test Connection" to ensure that the DSN is connected to IBM Cloud Object Storage properly. Navigate to the Tables tab to review the table definitions for IBM Cloud Object Storage.

IBM Cloud Object Storage Data の外部データソースの作成

After configuring the connection, you need to create a master encryption key and a credential database for the external data source.

Creating a Master Encryption Key

Execute the following SQL command to create a new master key, 'ENCRYPTION,' to encrypt the credentials for the external data source.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';

Creating a Credential Database

Execute the following SQL command to create credentials for the external data source connected to IBM Cloud Object Storage data.

NOTE: Since IBM Cloud Object Storage does not require a User or Password to authenticate, you may use whatever values you wish for IDENTITY and SECRET.

CREATE DATABASE SCOPED CREDENTIAL ibmcloudobjectstorage_creds
WITH IDENTITY = 'username', SECRET = 'password';

Create an External Data Source for IBM Cloud Object Storage

Execute the following SQL command to create an external data source for IBM Cloud Object Storage with PolyBase, using the DSN and credentials configured earlier.

For IBM Cloud Object Storage, set SERVERNAME to 'localhost' or '127.0.0.1' and leave PORT empty. PUSHDOWN is set to ON by default, meaning the ODBC Driver can leverage server-side processing for complex queries.

CREATE EXTERNAL DATA SOURCE cdata_ibmcloudobjectstorage_source
WITH ( 
  LOCATION = 'odbc://SERVERNAME[:PORT]',
  CONNECTION_OPTIONS = 'DSN=CData IBM Cloud Object Storage Sys',
  -- PUSHDOWN = ON | OFF,
  CREDENTIAL = ibmcloudobjectstorage_creds
);

IBM Cloud Object Storage のExternal Table を作成

After creating the external data source, use CREATE EXTERNAL TABLE statements to link to IBM Cloud Object Storage data from your SQL Server instance. The table column definitions must match those exposed by the CData ODBC Driver for IBM Cloud Object Storage. You can refer to the Tables tab of the DSN Configuration Wizard to see the table definition.

Sample CREATE TABLE Statement

The statement to create an external table based on a IBM Cloud Object Storage Objects would look similar to the following:

CREATE EXTERNAL TABLE Objects(
  Key [nvarchar](255) NULL,
  Etag [nvarchar](255) NULL,
  ...
) WITH ( 
  LOCATION='Objects',
  DATA_SOURCE=cdata_ibmcloudobjectstorage_source
);

Having created external tables for IBM Cloud Object Storage in your SQL Server instance, you are now able to query local and remote data simultaneously. Thanks to built-in query processing in the CData ODBC Driver, you know that as much query processing as possible is being pushed to IBM Cloud Object Storage, freeing up local resources and computing power. Download a free, 30-day trial of the ODBC Driver for IBM Cloud Object Storage and start working with live IBM Cloud Object Storage data alongside your SQL Server data today.

 
 
ダウンロード