How to connect to Databricks Data from MS Excel on Mac OS X



Create a Data Source Name in iODBC with the CData ODBC Driver for Databricks and work with Databricks data in Microsoft Excel on Mac OS X.

Microsoft Excel features calculations, graphing tools, pivot tables, and a macro programming language that allows users to work with data in many of the ways that suit their needs, whether on a Windows machine or a Macintosh machine. This article walks through creating a DSN for Databricks data in iODBC and accessing Databricks data in Microsoft Excel, all on a machine running Mac OS X.

About Databricks Data Integration

Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:

  • Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
  • Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
  • Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
  • Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.

While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.

Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.


Getting Started


Installing the CData ODBC Drivers on Mac OS X

The CData ODBC Driver for Databricks is preconfigured for the iODBC driver manager, as are many other products like Microsoft Excel. This makes the driver easy to use with these tools.

Licensing the Driver

In a terminal run the following commands to license the driver. To activate a trial license, omit the key input.

cd "/Applications/CData ODBC Driver for Databricks/bin"
sudo ./install-license <key>

Defining a DSN for iODBC with odbc.ini

You can define ODBC data sources in sections in the odbc.ini file. User data sources can only be accessed by the user account whose home folder the odbc.ini is located in. System data sources can be accessed by all users. You can find the correct odbc.ini in the following paths:

Privileges  Path
User/Users/myuser/Library/ODBC/odbc.ini
System/Library/ODBC/odbc.ini

Modifying iODBC's system-wide settings requires elevated permissions; to do so, you can use following to open a text editor from the terminal:

sudo nano /Library/ODBC/odbc.ini

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

  • Server: Set to the Server Hostname of your Databricks cluster.
  • HTTPPath: Set to the HTTP Path of your Databricks cluster.
  • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

When you configure the DSN, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

In addition to the connection properties required to connect to Databricks, the Driver property specifies either a driver definition in the odbcinst.ini file or the path to the driver library. Place your connection properties at the beginning of odbc.ini:

[CData Databricks Sources]
Driver = CData ODBC Driver for Databricks
Server = 127.0.0.1
Port = 443
TransportMode = HTTP
HTTPPath = MyHTTPPath
UseSSL = True
User = MyUser
Password = MyPassword

If you wish to authenticate using OAuth, you will need to add an additional connection property to ensure that the OAuth flow can execute properly:

Other = CheckPromptMode=False

Mac OS validates our drivers separately so you need to copy the license file to the appropriate path as well. After you have configured odbc.ini, run the following command.

sudo cp /Applications/CData ODBC Driver for Databricks/lib/CData.ODBC.Databricks.lic /Users/<YOUR_USER>/Library/Containers/com.microsoft.Excel/Data/.cdata/

Additionally, in the ODBC Data Sources section, the DSN must be set to a driver defined in the odbcinst.ini file. For example, below is the entry for the DSN created during the driver install:

[ODBC Data Sources]
CData Databricks Source = CData ODBC Driver for Databricks

Registering a DSN for iODBC with odbcinst.ini

You may need to modify the installed driver definition if you change the path to the driver library. To register an ODBC driver, modify the odbcinst.ini file. With iODBC, drivers can be available to only one user account or drivers can be available system wide. You can find the correct odbcinst.ini in the following paths:

Privileges  Path
User/Users/myuser/Library/ODBC/odbcinst.ini
System/Library/ODBC/odbcinst.ini

Drivers are defined in sections in the odbcinst.ini file. The section name specifies the name of the driver. In this section, the Driver property specifies the path to the driver library. The driver library is the .dylib file located in the lib subfolder of the installation directory, by default in /Applications/CData ODBC Driver for Databricks.

[CData ODBC Driver for Databricks]
Driver = /Applications/CData ODBC Driver for Databricks/lib/libdatabricks.odbc.dylib

The ODBC Drivers section must also contain a property with the driver name, set to "Installed".

[ODBC Drivers]
CData ODBC Driver for Databricks = Installed

Testing the Connection

You can test your connection using the iODBC administrator.

  1. Open a terminal and enter the following command to start the iODBC Administrator with the necessary permissions:
    sudo /Applications/iODBC/iODBC\ Administrator64.app/Contents/MacOS/iODBC\ Administrator64
    
  2. On the Users tab, select CData Databricks Source.
  3. Click the Test button.

Accessing Databricks Data from Microsoft Excel

You can use the DSN configured above to access Databricks data from Microsoft Excel.

  1. Open Microsoft Excel and open a spreadsheet (new or existing).
  2. Navigate to the data ribbon, click the drop down next to "Get Data (Power Query)," and select "From Database (Microsoft Query)"
  3. Select the User or System DSN that you previously configured and click OK.
  4. Build your SQL query in the Microsoft Query wizard:
  5. Click Return Data to execute the query and pull data into Excel.

Using the CData ODBC Driver for Databricks, you can easily pull your Databricks data directly into Excel. Once there, you can leverage all of the powerful features native to Excel to analyze, report, transform your Databricks data, and more!

Ready to get started?

Download a free trial of the Databricks ODBC Driver to get started:

 Download Now

Learn more:

Databricks Icon Databricks ODBC Driver

The Databricks ODBC Driver is a powerful tool that allows you to connect with live data from Databricks, directly from any applications that support ODBC connectivity.

Access Databricks data like you would a database - read, write, and update through a standard ODBC Driver interface.