Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →How to connect PolyBase to IBM Cloud Object Storage
Use CData drivers and PolyBase to create an external data source in SQL Server 2019 with access to live IBM Cloud Object Storage data.
PolyBase for SQL Server allows you to query external data by using the same Transact-SQL syntax used to query a database table. When paired with the CData ODBC Driver for IBM Cloud Object Storage, you get access to your IBM Cloud Object Storage data directly alongside your SQL Server data. This article describes creating an external data source and external tables to grant access to live IBM Cloud Object Storage data using T-SQL queries.
NOTE: PolyBase is only available on SQL Server 19 and above, and only for Standard SQL Server.
The CData ODBC drivers offer unmatched performance for interacting with live IBM Cloud Object Storage data using PolyBase due to optimized data processing built into the driver. When you issue complex SQL queries from SQL Server to IBM Cloud Object Storage, the driver pushes down supported SQL operations, like filters and aggregations, directly to IBM Cloud Object Storage and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. And with PolyBase, you can also join SQL Server data with IBM Cloud Object Storage data, using a single query to pull data from distributed sources.
Connect to IBM Cloud Object Storage
If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs. To create an external data source in SQL Server using PolyBase, configure a System DSN (CData IBM Cloud Object Storage Sys is created automatically).
Register a New Instance of Cloud Object Storage
If you do not already have Cloud Object Storage in your IBM Cloud account, follow the procedure below to install an instance of SQL Query in your account:
- Log in to your IBM Cloud account.
- Navigate to the page, choose a name for your instance and click Create. You will be redirected to the instance of Cloud Object Storage you just created.
Connecting using OAuth Authentication
There are certain connection properties you need to set before you can connect. You can obtain these as follows:
API Key
To connect with IBM Cloud Object Storage, you need an API Key. You can obtain this as follows:
- Log in to your IBM Cloud account.
- Navigate to the Platform API Keys page.
- On the middle-right corner click "Create an IBM Cloud API Key" to create a new API Key.
- In the pop-up window, specify the API Key name and click "Create". Note the API Key as you can never access it again from the dashboard.
Cloud Object Storage CRN
If you have multiple accounts, you will need to specify the CloudObjectStorageCRN explicitly. To find the appropriate value, you can:
- Query the Services view. This will list your IBM Cloud Object Storage instances along with the CRN for each.
- Locate the CRN directly in IBM Cloud. To do so, navigate to your IBM Cloud Dashboard. In the Resource List, Under Storage, select your Cloud Object Storage resource to get its CRN.
Connecting to Data
You can now set the following to connect to data:
- InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
- ApiKey: Set this to your API key which was noted during setup.
- CloudObjectStorageCRN (Optional): Set this to the cloud object storage CRN you want to work with. While the connector attempts to retrieve this automatically, specifying this explicitly is recommended if you have more than Cloud Object Storage account.
When you connect, the connector completes the OAuth process.
- Extracts the access token and authenticates requests.
- Saves OAuth values in OAuthSettingsLocation to be persisted across connections.
Click "Test Connection" to ensure that the DSN is connected to IBM Cloud Object Storage properly. Navigate to the Tables tab to review the table definitions for IBM Cloud Object Storage.
Create an External Data Source for IBM Cloud Object Storage Data
After configuring the connection, you need to create a master encryption key and a credential database for the external data source.
Creating a Master Encryption Key
Execute the following SQL command to create a new master key, 'ENCRYPTION,' to encrypt the credentials for the external data source.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';
Creating a Credential Database
Execute the following SQL command to create credentials for the external data source connected to IBM Cloud Object Storage data.
NOTE: Since IBM Cloud Object Storage does not require a User or Password to authenticate, you may use whatever values you wish for IDENTITY and SECRET.
CREATE DATABASE SCOPED CREDENTIAL ibmcloudobjectstorage_creds WITH IDENTITY = 'username', SECRET = 'password';
Create an External Data Source for IBM Cloud Object Storage
Execute a CREATE EXTERNAL DATA SOURCE SQL command to create an external data source for IBM Cloud Object Storage with PolyBase:
- Set the LOCATION parameter , using the DSN and credentials configured earlier.
For IBM Cloud Object Storage, set SERVERNAME to the URL or address for your server (e.g. 'localhost' or '127.0.0.1' for local servers; the remote URL for remote servers). Leave PORT empty. PUSHDOWN is set to ON by default, meaning the ODBC Driver can leverage server-side processing for complex queries.
CREATE EXTERNAL DATA SOURCE cdata_ibmcloudobjectstorage_source WITH ( LOCATION = 'odbc://SERVER_URL', CONNECTION_OPTIONS = 'DSN=CData IBM Cloud Object Storage Sys', -- PUSHDOWN = ON | OFF, CREDENTIAL = ibmcloudobjectstorage_creds );
Create External Tables for IBM Cloud Object Storage
After creating the external data source, use CREATE EXTERNAL TABLE statements to link to IBM Cloud Object Storage data from your SQL Server instance. The table column definitions must match those exposed by the CData ODBC Driver for IBM Cloud Object Storage. You can refer to the Tables tab of the DSN Configuration Wizard to see the table definition.
Sample CREATE TABLE Statement
The statement to create an external table based on a IBM Cloud Object Storage Objects would look similar to the following:
CREATE EXTERNAL TABLE Objects( Key [nvarchar](255) NULL, Etag [nvarchar](255) NULL, ... ) WITH ( LOCATION='Objects', DATA_SOURCE=cdata_ibmcloudobjectstorage_source );
Having created external tables for IBM Cloud Object Storage in your SQL Server instance, you are now able to query local and remote data simultaneously. Thanks to built-in query processing in the CData ODBC Driver, you know that as much query processing as possible is being pushed to IBM Cloud Object Storage, freeing up local resources and computing power. Download a free, 30-day trial of the ODBC Driver for IBM Cloud Object Storage and start working with live IBM Cloud Object Storage data alongside your SQL Server data today.