Enable the Databricks JDBC Driver in KNIME

Ready to get started?

Download for a free trial:

Download Now

Learn more:

Databricks JDBC Driver

Rapidly create and deploy powerful Java applications that integrate with Databricks.



Use standard data access components in KNIME to create charts and reports with Databricks data.

One of the strengths of the CData JDBC Driver for Databricks is its cross-platform support, enabling integration with major BI tools. Follow the procedure below to access Databricks data in KNIME and to create a chart from Databricks data using the report designer.

Define a New JDBC Connection to Databricks Data

  1. Install the Report Designer extension: Click File -> Install KNIME Extensions, and filter on "Report".
  2. In a new workflow, click File -> Preferences and expand the KNIME -> Databases node to add cdata.jdbc.databricks.jar. The driver JAR is located in the lib subfolder of the installation directory.
  3. In the Node Repository view, expand the Database -> Read/Write node and drag a Database Reader onto the workflow editor.
  4. Double-click the Database Reader and set the following properties:

    • Database Driver: In the menu, select the driver name, cdata.jdbc.databricks.DatabricksDriver
    • Database URL: Enter the connection properties. The JDBC URL begins with jdbc:databricks: and is followed by a semicolon-separated list of connection properties.

      To connect to a Databricks cluster, set the properties as described below.

      Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

      • Server: Set to the Server Hostname of your Databricks cluster.
      • HTTPPath: Set to the HTTP Path of your Databricks cluster.
      • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

      Built-in Connection String Designer

      For assistance in constructing the JDBC URL, use the connection string designer built into the Databricks JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

      java -jar cdata.jdbc.databricks.jar

      Fill in the connection properties and copy the connection string to the clipboard.

      When you configure the JDBC URL, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

      A typical JDBC URL is below.

      jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;
    • User Name: The username used to authenticate.
    • Password: The password used to authenticate.
    • SQL Statement: Enter an SQL query in the SQL Statement box or double-click a table. This article uses the query below to create a chart: SELECT City, CompanyName FROM Customers WHERE Country = 'US'
  5. Test the connection by clicking Fetch Metadata.

  6. Connect the Database Reader to a Data to Report node to supply the dataset to a range of data visualization controls. Click Execute and then click Edit Report at the top of the workflow to open the report designer perspective.
  7. You can now generate reports based on live data. To create a chart, drag the chart control from the palette to the report designer. In the resulting wizard, you can use the filtering and aggregation controls available in KNIME.

Troubleshooting

The following list shows how to resolve common errors:

  • Encountered duplicate row Id "Row1": To resolve this error, add the following to the knime.ini file located in your KNIME installation directory:-Dknime.database.fetchsize=0