Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →How to create Visio diagrams from Databricks Data
Automatically update Databricks data with the changes you make to Visio master shapes.
Automate the process of entering data into Visio diagrams and keeping your diagrams up to date with the CData ODBC Driver for Databricks. The driver surfaces Databricks data as an ODBC data source that can be accessed by applications with built-in ODBC support like Microsoft Office. This article shows how to create a simple diagram to start brainstorming about Visio projects linked to Databricks data.
About Databricks Data Integration
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
- Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
- Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
- Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
- Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
Getting Started
Connect to Databricks as an ODBC Data Source
If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.
To connect to a Databricks cluster, set the properties as described below.
Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
- Server: Set to the Server Hostname of your Databricks cluster.
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
Connect Databricks Data to Diagrams
Follow the steps below to use the Data Selector Wizard to import data into your diagram.
- Open Visio and click File -> New. Open the Brainstorming template.
- On the Data tab, click Custom Import and select Other OLEDB or ODBC data source.
- Select the ODBC option and select ODBC DSN.
- Select the DSN for Databricks, select the table you want to import, and finish the wizard. This article uses Customers as an example.
Link Databricks Entities to Shapes
Follow the steps below to create a simple diagram that shows how to create shapes from your data, one of the ways to link Databricks entities to shapes:
- Click Brainstorming Shapes and drag a main topic onto the drawing page. Enter Customers as the text of the main topic.
- Click Topic.
- Select a row in the External Data window and drag it onto the drawing page.
- Right-click the Topic shape and click Data -> Edit Data Graphic.
- Click New Item.
- In the Data Field menu, select a column. In the Displayed As menu, select how to display them.
- Drag a few other Databricks entities onto the drawing page and add association lines back to the main topic, Customers. New topics have the same configuration: Numeric columns displayed in data bars stand out in contrast to other Customers entities.
You can refresh your diagram from the Data tab, synchronizing your shapes with the external Databricks data.