Integrate Databricks Data in Your Informatica Cloud Instance



Use CData JDBC drivers with the Informatica Cloud Secure Agent to access live Databricks data from Informatica Cloud.

Informatica Cloud allows you to perform extract, transform, and load (ETL) tasks in the cloud. With the Cloud Secure Agent and the CData JDBC Driver for Databricks, you get live access to Databricks data, directly within Informatica Cloud. In this article, we will walk through downloading and registering the Cloud Secure Agent, connecting to Databricks through the JDBC Driver and generating a mapping that can be used in any Informatica Cloud process.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Informatica Cloud Secure Agent

To work with the Databricks data through the JDBC Driver, install the Cloud Secure Agent.

  1. Navigate to the Administrator page in Informatica Cloud
  2. Select the Runtime Environments tab
  3. Click "Download Secure Agent"
  4. Make note of the Install Token
  5. Run the installer on the client machine and register the Cloud Secure Agent with your username and install token

NOTE: It may take some time for all of the Cloud Secure Agent services to get up and running.

Connecting to the Databricks JDBC Driver

With the Cloud Secure Agent installed and running, you are ready to connect to Databricks through the JDBC Driver.

Adding the JDBC Driver to the Secure Agent Machine

  1. Navigate to the following directory on the Secure Agent machine:

    %Secure Agent installation directory%/ext/connectors/thirdparty/
  2. Create a folder and add the driver JAR file (cdata.jdbc.databricks.jar) based on the type of mapping that you want to configure.

    For mappings, create the following folder and add the driver JAR file:

    informatica.jdbc_v2/common

    For mappings in advanced mode, also create the following folder and add the driver JAR file:

    informatica.jdbc_v2/spark
  3. Restart the Secure Agent.

Connecting to Databricks in Informatica Cloud

After installing the driver JAR file, you are ready to configure your connection to Databricks in Informatica Cloud. Start by clicking the Connections tab and clicking New Connection. Fill in the following properties for the connection:

  • Connection Name: Name your connection (i.e.: CData Databricks Connection)
  • Type: Select "JDBC_V2"
  • Runtime Environment: Select the runtime environment where you installed the Secure Agent
  • JDBC Driver Class Name: The name of the JDBC driver class: cdata.jdbc.databricks.DatabricksDriver
  • JDBC Connection URL: Set this to the JDBC URL for Databricks. Your URL will look similar to the following:

    jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;

    To connect to a Databricks cluster, set the properties as described below.

    Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

    • Server: Set to the Server Hostname of your Databricks cluster.
    • HTTPPath: Set to the HTTP Path of your Databricks cluster.
    • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

    Built-In Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the Databricks JDBC Driver. Either double-click the .jar file or execute the .jar file from the command-line.

    java -jar cdata.jdbc.databricks.jar

    Fill in the connection properties and copy the connection string to the clipboard.

  • Username: Set this to the username for Databricks
  • Password: Set this to the password for Databricks

Create a Mapping for Databricks Data

With the connection to Databricks configured, we can now access Databricks data in any Informatica process. The steps below walk through creating a mapping for Databricks to another data target.

  1. Navigate to the Data Integration page
  2. Click New.. and select Mapping from the Mappings tab
  3. Click the Source Object and in the Source tab, select the Connection and set the Source Type
  4. Click "Select" to choose the table to map
  5. In the Fields tab, select the fields from the Databricks table to map
  6. Click the Target object and configure the Target source, table and fields. In the Field Mapping tab, map the source fields to the target fields.

With the mapping configured, you are ready to start integrating live Databricks data with any of the supported connections in Informatica Cloud. Download a free, 30-day trial of the CData JDBC Driver for Databricks and start working with your live Databricks data in Informatica Cloud today.

Ready to get started?

Download a free trial of the Databricks Driver to get started:

 Download Now

Learn more:

Databricks Icon Databricks JDBC Driver

Rapidly create and deploy powerful Java applications that integrate with Databricks.