Connect and Visualize Live IBM Cloud Object Storage Data in Databricks Lakehouse Federation with CData Connect AI

Dibyendu Datta
Dibyendu Datta
Lead Technology Evangelist
Use CData Connect AI to integrate live IBM Cloud Object Storage data into the Databricks platform and create visualization dashboards with real-time IBM Cloud Object Storage data.

Databricks Lakehouse Federation enables organizations to query and integrate data from multiple sources without requiring data movement. It allows federated queries across databases, data warehouses, and lakehouses, providing a unified interface for data analysis and management within Databricks. When combined with CData Connect AI, it enables seamless access to IBM Cloud Object Storage data for data virtualization, while also supporting data lineage and fine-grained access control.

This article explains how to use CData Connect AI to establish a live connection to IBM Cloud Object Storage and how to access live IBM Cloud Object Storage data from the Databricks platform.

CData Connect AI offers a seamless SQL Server, cloud-to-cloud interface for IBM Cloud Object Storage, enabling you to effortlessly create dashboards and visualizations using live IBM Cloud Object Storage data in Databricks. While building visualizations, Databricks requires SQL queries to retrieve the necessary data. With built-in optimized data processing, CData Connect AI pushes all supported SQL operations (such as filters and JOINs) directly to IBM Cloud Object Storage, utilizing server-side processing for fast and efficient data retrieval of IBM Cloud Object Storage data.

Configure IBM Cloud Object Storage connectivity for Databricks in CData Connect AI

To work with IBM Cloud Object Storage data in Databricks - Lakehouse Federation, you need to connect to IBM Cloud Object Storage from Connect AI and provide user access to the connection.

  1. Log into Connect AI, click Sources, and then click Add Connection
  2. Adding a Connection
  3. Select "IBM Cloud Object Storage" from the Add Connection panel
  4. Selecting a data source
  5. Enter the necessary authentication properties to connect to IBM Cloud Object Storage.

    Register a New Instance of Cloud Object Storage

    If you do not already have Cloud Object Storage in your IBM Cloud account, follow the procedure below to install an instance of SQL Query in your account:

    1. Log in to your IBM Cloud account.
    2. Navigate to the page, choose a name for your instance and click Create. You will be redirected to the instance of Cloud Object Storage you just created.

    Connecting using OAuth Authentication

    There are certain connection properties you need to set before you can connect. You can obtain these as follows:

    API Key

    To connect with IBM Cloud Object Storage, you need an API Key. You can obtain this as follows:

    1. Log in to your IBM Cloud account.
    2. Navigate to the Platform API Keys page.
    3. On the middle-right corner click "Create an IBM Cloud API Key" to create a new API Key.
    4. In the pop-up window, specify the API Key name and click "Create". Note the API Key as you can never access it again from the dashboard.

    Cloud Object Storage CRN

    If you have multiple accounts, specify the CloudObjectStorageCRN explicitly. To find the appropriate value, you can:

    • Query the Services view. This will list your IBM Cloud Object Storage instances along with the CRN for each.
    • Locate the CRN directly in IBM Cloud. To do so, navigate to your IBM Cloud Dashboard. In the Resource List, Under Storage, select your Cloud Object Storage resource to get its CRN.

    Connecting to Data

    You can now set the following to connect to data:

    • InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
    • ApiKey: Set this to your API key which was noted during setup.
    • CloudObjectStorageCRN (Optional): Set this to the cloud object storage CRN you want to work with. While the connector attempts to retrieve this automatically, specifying this explicitly is recommended if you have more than Cloud Object Storage account.

    When you connect, the connector completes the OAuth process.

    1. Extracts the access token and authenticates requests.
    2. Saves OAuth values in OAuthSettingsLocation to be persisted across connections.
    Configuring a connection (Salesforce is shown)
  6. Click Save & Test
  7. Navigate to the Permissions tab in the Add IBM Cloud Object Storage Connection page and update the User-based permissions. Updating permissions

Add a Personal Access Token

When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. It is best practice to create a separate PAT for each service to maintain granularity of access.

  1. Click on the Gear icon () at the top right of the Connect AI app to open the settings page.
  2. On the Settings page, go to the Access Tokens section and click Create PAT.
  3. Give the PAT a name and click Create. Creating a new PAT
  4. The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.

With the connection configured and a PAT generated, you are ready to connect to IBM Cloud Object Storage data from Databricks.

Connecting live IBM Cloud Object Storage data in Databricks

Follow these steps to establish a connection from Databricks to the CData Connect AI Virtual SQL Server API.

  1. Log into Databricks.
  2. Navigate to SQL Warehouses and start any warehouse of your choice. Start SQL Warehouse
  3. In the navigation pane, select Catalog. Click and select Create a connection. Create a connection
  4. In the Connection basics section (or Step 1 of Set up connection page), enter the following connection details and click Next:
    • Connection name: a user-defined connection name.
    • Connection type: select SQL Server from the drop-down list.
    • Auth type: select Username and password.
    Add connection basics details
  5. In the Authentication section (or Step 2), enter the required authentication details, and click Next:
    • Host: tds.cdata.com
    • Port: 14333
    • User: enter your CData Connect AI username, displayed in the top-right corner of the CData Connect AI interface. For example, [email protected]
    • Password: enter the PAT generated and copied in the previous section.
    Add authentication details
  6. In the Connection details section (or Step 3), enable the Trust server certificate checkbox and select the appropriate Application intent. Click Create Connection. Add connection details
  7. In the Catalog basics section (or Step 4), enter the required details and click Create catalog:
    • Catalog name: enter a name of your choice
    • Connection: this will be the Databricks connection you defined earlier
    • Database: enter your IBM Cloud Object Storage connection name (for example, IBM Cloud Object Storage1)
    Add catalog basics details
  8. In the Access section (or Step 5), assign the Workspace, User access rights, and Grant read or edit privileges to the catalog. Add the access rights Grant the access rights
  9. Click Next > Save to save all the details for the catalog. Save the catalog details and set up the connection

Access the catalog and visualize live IBM Cloud Object Storage data in Databricks

To access the newly created catalog and create a dashboard to visualize live IBM Cloud Object Storage data in Databricks, follow these steps:

  1. Select the catalog and expand it. A list of tables from IBM Cloud Object Storage will appear on the screen. Select and expand the catalog
  2. Choose the desired table and click the Overview tab to view the table metadata. Select Overview View the table metadata
  3. Click the Sample Data tab to view real-time data in the table. Select Sample Data to view the table data
  4. Now, click Create at the top right corner and select Dashboard. Create a new dashboard
  5. Manually create a visualization by selecting at least one field in the visualization editor from the widget, or choose one of the visualization options suggested by Databricks AI. Create the dashboard manually or using the Databricks AI
  6. Once the visualization is created, edit the details in the widget settings of the dashboard. Visualization is created
  7. Click Publish to publish the dashboard report. Publish the dashboard

Live access to IBM Cloud Object Storage data from cloud applications

At this stage, you have established a direct, cloud-to-cloud connection to live IBM Cloud Object Storage data in Databricks. This enables you to create dashboards to monitor and visualize your data seamlessly.

For more details on accessing live data from over 100 SaaS, Big Data, and NoSQL sources through cloud applications like Databricks, visit our Connect AI page. As always, let us know if you have any questions during your evaluation. Our world-class CData Support Team is always available to help!

Ready to get started?

Learn more about CData Connect AI or sign up for free trial access:

Free Trial