Connect and Query Live IBM Cloud Object Storage Data in Databricks with CData Connect AI

Mohsin Turki
Mohsin Turki
Technical Marketing Engineer
Use CData Connect AI to integrate live IBM Cloud Object Storage data into Databricks and enable direct, live querying and analysis without replication.

Databricks is a leading AI cloud-native platform that unifies data engineering, machine learning, and analytics at scale. Its powerful data lakehouse architecture combines the performance of data warehouses with the flexibility of data lakes. Integrating Databricks with CData Connect AI gives organizations live, real-time access to IBM Cloud Object Storage data without the need for complex ETL pipelines or data duplication—streamlining operations and reducing time-to-insights.

In this article, we'll walk through how to configure a secure, live connection from Databricks to IBM Cloud Object Storage using CData Connect AI. Once configured, you'll be able to access IBM Cloud Object Storage data directly from Databricks notebooks using standard SQL—enabling unified, real-time analytics across your data ecosystem.

Overview

Here is an overview of the simple steps:

  1. Step 1 — Connect and Configure: In CData Connect AI, create a connection to your IBM Cloud Object Storage source, configure user permissions, and generate a Personal Access Token (PAT).
  2. Step 2 — Query from Databricks: Install the CData JDBC driver in Databricks, configure your notebook with the connection details, and run SQL queries to access live IBM Cloud Object Storage data.

Prerequisites

Before you begin, make sure you have the following:

  1. An active IBM Cloud Object Storage account.
  2. A CData Connect AI account. You can log in or sign up for a free trial here.
  3. A Databricks account. Sign up or log in here.

Step 1: Connect and Configure a IBM Cloud Object Storage Connection in CData Connect AI

1.1 Add a Connection to IBM Cloud Object Storage

CData Connect AI uses a straightforward, point-and-click interface to connect to available data sources.

  1. Log into Connect AI, click Sources on the left, and then click Add Connection in the top-right.
  2. Adding a Connection in CData Connect AI
  3. Select "IBM Cloud Object Storage" from the Add Connection panel.
  4. Selecting a data source
  5. Enter the necessary authentication properties to connect to IBM Cloud Object Storage.

    Register a New Instance of Cloud Object Storage

    If you do not already have Cloud Object Storage in your IBM Cloud account, follow the procedure below to install an instance of SQL Query in your account:

    1. Log in to your IBM Cloud account.
    2. Navigate to the page, choose a name for your instance and click Create. You will be redirected to the instance of Cloud Object Storage you just created.

    Connecting using OAuth Authentication

    There are certain connection properties you need to set before you can connect. You can obtain these as follows:

    API Key

    To connect with IBM Cloud Object Storage, you need an API Key. You can obtain this as follows:

    1. Log in to your IBM Cloud account.
    2. Navigate to the Platform API Keys page.
    3. On the middle-right corner click "Create an IBM Cloud API Key" to create a new API Key.
    4. In the pop-up window, specify the API Key name and click "Create". Note the API Key as you can never access it again from the dashboard.

    Cloud Object Storage CRN

    If you have multiple accounts, specify the CloudObjectStorageCRN explicitly. To find the appropriate value, you can:

    • Query the Services view. This will list your IBM Cloud Object Storage instances along with the CRN for each.
    • Locate the CRN directly in IBM Cloud. To do so, navigate to your IBM Cloud Dashboard. In the Resource List, Under Storage, select your Cloud Object Storage resource to get its CRN.

    Connecting to Data

    You can now set the following to connect to data:

    • InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
    • ApiKey: Set this to your API key which was noted during setup.
    • CloudObjectStorageCRN (Optional): Set this to the cloud object storage CRN you want to work with. While the connector attempts to retrieve this automatically, specifying this explicitly is recommended if you have more than Cloud Object Storage account.

    When you connect, the connector completes the OAuth process.

    1. Extracts the access token and authenticates requests.
    2. Saves OAuth values in OAuthSettingsLocation to be persisted across connections.
    Configuring a connection (Salesforce is shown)
  6. Click Save & Test in the top-right.
  7. Navigate to the Permissions tab on the IBM Cloud Object Storage Connection page and update the user-based permissions based on your preferences. Updating permissions

1.2 Generate a Personal Access Token (PAT)

When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. PAT functions as an alternative to your login credentials for secure, token-based authentication. It is a best practice to create a separate PAT for each service to maintain granularity of access.

  1. Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
  2. On the Settings page, go to the Access Tokens section and click Create PAT.
  3. Give the PAT a name and click Create. Creating a new PAT
  4. Note: The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.

Step 2: Connect and Query IBM Cloud Object Storage Data in Databricks

Follow these steps to establish a connection from Databricks to IBM Cloud Object Storage. You'll install the CData JDBC Driver for Connect AI, add the JAR file to your cluster, configure your notebooks, and run SQL queries to access live IBM Cloud Object Storage data data.

2.1 Install the CData JDBC Driver for Connect AI

  1. In CData Connect AI, click the Integrations page on the left. Search for JDBC or Databricks, click Download, and select the installer for your operating system.
  2. Once downloaded, run the installer and follow the instructions:
    • For Windows: Run the setup file and follow the installation wizard.
    • For Mac/Linux: Unpack the archive and move the folder to /opt or /Applications. Make sure you have execute permissions.
  3. After installation, locate the JAR file in the installation directory:
    • Windows:
      C:\Program Files\CData\CData JDBC Driver for Connect AI\lib\cdata.jdbc.connect.jar
    • Mac/Linux:
      /Applications/CData/CData JDBC Driver for Connect AI/lib/cdata.jdbc.connect.jar

2.2 Install the JAR File on Databricks

  1. Log in to Databricks. In the navigation pane, click Compute on the left. Start or create a compute cluster. Launching a compute cluster in Databricks
  2. Click on the running cluster, go to the Libraries tab, and click Install New at the top right. Accessing the Libraries tab in Databricks
  3. In the Install Library dialog, select DBFS, and drag and drop the cdata.jdbc.connect.jar file. Click Install. Uploading the JDBC driver JAR to DBFS

2.3 Query IBM Cloud Object Storage Data in a Databricks Notebook

Notebook Script 1 — Define JDBC Connection:

  1. Paste the following script into the notebook cell:
driver = "cdata.jdbc.connect.ConnectDriver"
url = "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;"
  1. Replace:
    • your_username - With your CData Connect AI username
    • your_pat - With your CData Connect AI Personal Access Token (PAT)
    • Your_Connection_Name - With the name of your Connect AI data source, from the Sources page
  2. Run the script.

Notebook Script 2 — Load DataFrame from IBM Cloud Object Storage data:

  1. Add a new cell for this second script. From the menu on the right side of your notebook, click Add cell below.
  2. Paste the following script into the new cell:
remote_table = spark.read.format("jdbc") \
  .option("driver", "cdata.jdbc.connect.ConnectDriver") \
  .option("url", "jdbc:connect:AuthScheme=Basic;User=your_username;Password=your_pat;URL=https://cloud.cdata.com/api/;DefaultCatalog=Your_Connection_Name;") \
  .option("dbtable", "YOUR_SCHEMA.YOUR_TABLE") \
  .load()
  1. Replace:
    • your_username - With your CData Connect AI username
    • your_pat - With your CData Connect AI Personal Access Token (PAT)
    • Your_Connection_Name - With the name of your Connect AI data source, from the Sources page
    • YOUR_SCHEMA.YOUR_TABLE - With your schema and table, for example, IBMCloudObjectStorage.Objects
  2. Run the script.

Notebook Script 3 — Preview Columns:

  1. Similarly, add a new cell for this third script.
  2. Paste the following script into the new cell:
display(remote_table.select("ColumnName1", "ColumnName2"))
  1. Replace ColumnName1 and ColumnName2 with the actual columns from your IBM Cloud Object Storage structure (e.g. Key, Etag, etc.).
  2. Run the script.
Previewing IBM Cloud Object Storage data data in Databricks notebook

You can now explore, join, and analyze live IBM Cloud Object Storage data directly within Databricks notebooks—without needing to know the complexities of the back-end API and without replicating IBM Cloud Object Storage data.


Try CData Connect AI Free for 14 Days

Ready to simplify real-time access to IBM Cloud Object Storage data? Start your free 14-day trial of CData Connect AI today and experience seamless, live connectivity from Databricks to IBM Cloud Object Storage.

Low code, zero infrastructure, zero replication — just seamless, secure access to your most critical data and insights.

Ready to get started?

Learn more about CData Connect AI or sign up for free trial access:

Free Trial