We are proud to share our inclusion in the 2024 Gartner Magic Quadrant for Data Integration Tools. We believe this recognition reflects the differentiated business outcomes CData delivers to our customers.
Get the Report →Connect to Live Databricks Data in MicroStrategy through CData Connect Cloud
Create a live connection to Databricks Data in CData Connect Cloud and connect to your Databricks data from MicroStrategy.
MicroStrategy is an analytics and mobility platform that enables data-driven innovation. When you pair MicroStrategy with CData Connect Cloud, you gain database-like access to live Databricks data from MicroStrategy, expanding your reporting and analytics capabilities. In this article, we walk through connecting to Databricks in Connect Cloud and connecting to Connect Cloud in MicroStrategy to create a simple visualization of Databricks data.
As a cloud-based integration platform, Connect Cloud is ideal for working with cloud-based BI and analytics tools. With no servers to configure or data proxies to set up, you can simply use the web-based UI to create a live connection to Databricks and connect from MicroStrategy to start performing analytics based on live Databricks data.
About Databricks Data Integration
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
- Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
- Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
- Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
- Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
Getting Started
Configure Databricks Connectivity for Microstrategy
Connectivity to Databricks from Microstrategy is made possible through CData Connect Cloud. To work with Databricks data from Microstrategy, we start by creating and configuring a Databricks connection.
- Log into Connect Cloud, click Connections and click Add Connection
- Select "Databricks" from the Add Connection panel
-
Enter the necessary authentication properties to connect to Databricks.
To connect to a Databricks cluster, set the properties as described below.
Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
- Server: Set to the Server Hostname of your Databricks cluster.
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
- Click Create & Test
- Navigate to the Permissions tab in the Add Databricks Connection page and update the User-based permissions.
Add a Personal Access Token
If you are connecting from a service, application, platform, or framework that does not support OAuth authentication, you can create a Personal Access Token (PAT) to use for authentication. Best practices would dictate that you create a separate PAT for each service, to maintain granularity of access.
- Click on your username at the top right of the Connect Cloud app and click User Profile.
- Oa the User Profile page, scroll down to the Personal Access Tokens section and click Create PAT.
- Give your PAT a name and click Create.
- The personal access token is only visible at creation, so be sure to copy it and store it securely for future use.
With the connection configured, you are ready to connect to Databricks data from MicroStrategy.
Connect to and Visualize Databricks Data Using MicroStrategy
You can connect to Databricks in MicroStrategy by adding a data source based on the native SQL Server functionality. Once you have created a data source, you can build dynamic visualizations of Databricks data in MicroStrategy.
- Open MicroStrategy and select your account.
- Click Add External Data, select Databases, and use Select Tables as the Import Option.
- In the Import from Tables wizard, click to add a new Data Source.
- Select "SQL Server" in the Database menu and select "SQL Server 2017" in the Version menu.
- Sat the connection properties as follows:
- Server Name: tds.cdata.com
- Port Number: 14333
- Database Name: the name of your Databricks connection (e.g. Databricks1)
- User: a Connect Cloud user
- Password: the PAT for your Connect Cloud user
- Data Source Name: a name for the new external data source, like "CData Cloud Databricks"
- Expand the menu for the new data source and choose "Edit Catalog Options"
- Edit the "SQL statement retrieve columns ..." query to include TABLE_SCHEMA = '#?Schema_Name?#' in the WHERE clause, and click Apply and then OK (the complete query is below).
SELECT DISTINCT TABLE_SCHEMA NAME_SPACE, TABLE_NAME TAB_NAME, COLUMN_NAME COL_NAME, (CASE WHEN (DATA_TYPE LIKE '%char' AND (CHARACTER_SET_NAME='utf8' OR CHARACTER_SET_NAME='usc2')) THEN CONCAT('a',DATA_TYPE) ELSE DATA_TYPE END) DATA_TYPE, CHARACTER_MAXIMUM_LENGTH DATA_LEN, NUMERIC_PRECISION DATA_PREC, NUMERIC_SCALE DATA_SCALE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME IN (#TABLE_LIST#) AND TABLE_SCHEMA='#?Schema_Name?#' ORDER BY 1,2,3
- Select the new data source and select the Namespace that corresponds to your virtual Databricks database (like Databricks1).
- Drag tables into the pane to insert then.
Note: Since we create a live connection, we can insert whole tables and utilize the filtering and aggregation features native to the MicroStrategy products to customize our datasets.
- Click Finish, choose the option to connect live, save the query, and choose the option to create a new dossier. Live connections are possible and effective, thanks to high-performance data processing native to CData Connect Cloud.
- Choose a visualization, choose fields to display, and apply any filters to create a new visualization of Databricks data. Data types are discovered automatically through dynamic metadata discovery. Where possible, the complex queries generated by the filters and aggregations will be pushed down to Databricks, while any unsupported operations (which can include SQL functions and JOIN operations) will be managed by the CData SQL engine embedded in Connect Cloud.
- Once you have finished configuring the dossier, click File -> Save.
Using CData Connect Cloud with MicroStrategy, you can easily create robust visualizations and reports on Databricks data. For more information on connecting to Databricks (and more than 100 other data sources), visit the Connect Cloud page. Sign up for a free trial and start working with live Databricks data in MicroStrategy.