We are proud to share our inclusion in the 2024 Gartner Magic Quadrant for Data Integration Tools. We believe this recognition reflects the differentiated business outcomes CData delivers to our customers.
Get the Report →Object-Relational Mapping (ORM) with Databricks Data Entities in Java
Object-relational mapping (ORM) techniques make it easier to work with relational data sources and can bridge your logical business model with your physical storage model. Follow this tutorial to integrate connectivity to Databricks data into a Java-based ORM framework, Hibernate.
You can use Hibernate to map object-oriented domain models to a traditional relational database. The tutorial below shows how to use the CData JDBC Driver for Databricks to generate an ORM of your Databricks repository with Hibernate.
Though Eclipse is the IDE of choice for this article, the CData JDBC Driver for Databricks works in any product that supports the Java Runtime Environment. In the Knowledge Base you will find tutorials to connect to Databricks data from IntelliJ IDEA and NetBeans.
About Databricks Data Integration
Accessing and integrating live data from Databricks has never been easier with CData. Customers rely on CData connectivity to:
- Access all versions of Databricks from Runtime Versions 9.1 - 13.X to both the Pro and Classic Databricks SQL versions.
- Leave Databricks in their preferred environment thanks to compatibility with any hosting solution.
- Secure authenticate in a variety of ways, including personal access token, Azure Service Principal, and Azure AD.
- Upload data to Databricks using Databricks File System, Azure Blog Storage, and AWS S3 Storage.
While many customers are using CData's solutions to migrate data from different systems into their Databricks data lakehouse, several customers use our live connectivity solutions to federate connectivity between their databases and Databricks. These customers are using SQL Server Linked Servers or Polybase to get live access to Databricks from within their existing RDBMs.
Read more about common Databricks use-cases and how CData's solutions help solve data problems in our blog: What is Databricks Used For? 6 Use Cases.
Getting Started
Install Hibernate
Follow the steps below to install the Hibernate plug-in in Eclipse.
- In Eclipse, navigate to Help -> Install New Software.
- Enter "http://download.jboss.org/jbosstools/neon/stable/updates/" in the Work With box.
- Enter "Hibernate" into the filter box.
- Select Hibernate Tools.
Start A New Project
Follow the steps below to add the driver JARs in a new project.
- Create a new project. Select Java Project as your project type and click Next. Enter a project name and click Finish.
- Right-click the project and click Properties. Click Java Build Path and then open the Libraries tab.
- Click Add External JARs to add the cdata.jdbc.databricks.jar library, located in the lib subfolder of the installation directory.
Add a Hibernate Configuration File
Follow the steps below to configure connection properties to Databricks data.
- Right-click on the new project and select New -> Hibernate -> Hibernate Configuration File (cfg.xml).
- Select src as the parent folder and click Next.
Input the following values:
- Hibernate version:: 5.2
- Database dialect: Derby
- Driver class: cdata.jdbc.databricks.DatabricksDriver
Connection URL: A JDBC URL, starting with jdbc:databricks: and followed by a semicolon-separated list of connection properties.
To connect to a Databricks cluster, set the properties as described below.
Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.
- Server: Set to the Server Hostname of your Databricks cluster.
- HTTPPath: Set to the HTTP Path of your Databricks cluster.
- Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).
Built-in Connection String Designer
For assistance in constructing the JDBC URL, use the connection string designer built into the Databricks JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.
java -jar cdata.jdbc.databricks.jar
Fill in the connection properties and copy the connection string to the clipboard.
A typical JDBC URL is below:
jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;
Connect Hibernate to Databricks Data
Follow the steps below to select the configuration you created in the previous step.
- Switch to the Hibernate Configurations perspective: Window -> Open Perspective -> Hibernate.
- Right-click on the Hibernate Configurations panel and click Add Configuration.
- Set the Hibernate version to 5.2.
- Click the Browse button and select the project.
- For the Configuration file field, click Setup -> Use Existing and select the location of the hibernate.cfg.xml file (inside src folder in this demo).
- In the Classpath tab, if there is nothing under User Entries, click Add External JARS and add the driver jar once more. Click OK once the configuration is done.
- Expand the Database node of the newly created Hibernate configurations file.
Reverse Engineer Databricks Data
Follow the steps below to generate the reveng.xml configuration file. You will specify the tables you want to access as objects.
- Switch back to the Package Explorer.
- Right-click your project, select New -> Hibernate -> Hibernate Reverse Engineering File (reveng.xml). Click Next.
- Select src as the parent folder and click Next.
- In the Console configuration drop-down menu, select the Hibernate configuration file you created above and click Refresh.
- Expand the node and choose the tables you want to reverse engineer. Click Finish when you are done.
Configure Hibernate to Run
Follow the steps below to generate plain old Java objects (POJO) for the Databricks tables.
- From the menu bar, click Run -> Hibernate Code Generation -> Hibernate Code Generation Configurations.
- In the Console configuration drop-down menu, select the Hibernate configuration file you created in the previous section. Click Browse by Output directory and select src.
- Enable the Reverse Engineer from JDBC Connection checkbox. Click the Setup button, click Use Existing, and select the location of the hibernate.reveng.xml file (inside src folder in this demo).
- In the Exporters tab, check Domain code (.java) and Hibernate XML Mappings (hbm.xml).
- Click Run.
One or more POJOs are created based on the reverse-engineering setting in the previous step.
Insert Mapping Tags
For each mapping you have generated, you will need to create a mapping tag in hibernate.cfg.xml to point Hibernate to your mapping resource. Open hibernate.cfg.xml and insert the mapping tags as so:
cdata.databricks.DatabricksDriver
jdbc:databricks:Server=127.0.0.1;Port=443;TransportMode=HTTP;HTTPPath=MyHTTPPath;UseSSL=True;User=MyUser;Password=MyPassword;
org.hibernate.dialect.SQLServerDialect
Execute SQL
Using the entity you created from the last step, you can now search and modify Databricks data:
import java.util.*;
import org.hibernate.Session;
import org.hibernate.cfg.Configuration;
import org.hibernate.query.Query;
public class App {
public static void main(final String[] args) {
Session session = new
Configuration().configure().buildSessionFactory().openSession();
String SELECT = "FROM Customers C WHERE Country = :Country";
Query q = session.createQuery(SELECT, Customers.class);
q.setParameter("Country","US");
List<Customers> resultList = (List<Customers>) q.list();
for(Customers s: resultList){
System.out.println(s.getCity());
System.out.println(s.getCompanyName());
}
}
}