We are proud to share our inclusion in the 2024 Gartner Magic Quadrant for Data Integration Tools. We believe this recognition reflects the differentiated business outcomes CData delivers to our customers.
Get the Report →How to create Databricks federated tables in MySQL
Use the SQL Gateway and the ODBC Driver to set up federated tables for Databricks data in MySQL .
You can use the SQL Gateway to configure a MySQL remoting service and set up federated tables for Databricks data. The service is a daemon process that provides a MySQL interface to the CData ODBC Driver for Databricks: After you have started the service, you can create a server and tables using the FEDERATED Storage Engine in MySQL. You can then work with Databricks data just as you would local MySQL tables.
About Databricks Data Integration
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
- Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
- Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
- Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
- Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
Getting Started
Connect to Databricks Data
If you have not already done so, provide values for the required connection properties in the data source name (DSN). You can use the built-in Microsoft ODBC Data Source Administrator to configure the DSN. This is also the last step of the driver installation. See the "Getting Started" chapter in the help documentation for a guide to using the Microsoft ODBC Data Source Administrator to create and configure a DSN.
To connect to a Databricks cluster, set the properties as described below.
Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
- Server: Set to the Server Hostname of your Databricks cluster.
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
Configure the SQL Gateway
See the SQL Gateway Overview to set up connectivity to Databricks data as a virtual MySQL database. You will configure a MySQL remoting service that listens for MySQL requests from clients. The service can be configured in the SQL Gateway UI.
Create a FEDERATED Server and Tables for Databricks Data
After you have configured and started the service, create a FEDERATED server to simplify the process of creating FEDERATED tables:
Create a FEDERATED Server
The following statement will create a FEDERATED server based on the ODBC Driver for Databricks. Note that the username and password of the FEDERATED server must match a user account you defined on the Users tab of the SQL Gateway.
CREATE SERVER fedDatabricks FOREIGN DATA WRAPPER mysql OPTIONS (USER 'sql_gateway_user', PASSWORD 'sql_gateway_passwd', HOST 'sql_gateway_host', PORT ####, DATABASE 'CData Databricks Sys');
Create a FEDERATED Table
To create a FEDERATED table using our newly created server, use the CONNECTION keyword and pass the name of the FEDERATED server and the remote table (Customers). Refer to the following template for the statement to create a FEDERATED table:
CREATE TABLE fed_customers ( ..., city TYPE(LEN), companyname TYPE(LEN), ..., ) ENGINE=FEDERATED DEFAULT CHARSET=latin1 CONNECTION='fedDatabricks/customers';
NOTE: The table schema for the FEDERATED table must match the remote table schema exactly. You can always connect directly to the MySQL remoting service using any MySQL client and run a SHOW CREATE TABLE query to get the table schema.
Execute Queries
You can now execute queries to the Databricks FEDERATED tables from any tool that can connect to MySQL, which is particularly useful if you need to JOIN data from a local table with data from Databricks. Refer to the following example:
SELECT fed_customers.city, local_table.custom_field FROM local_table JOIN fed_customers ON local_table.foreign_city = fed_customers.city;