Access Live HDFS Data in Spring Boot

Ready to get started?

Download a free trial of the HDFS Driver to get started:

 Download Now

Learn more:

HDFS Icon HDFS JDBC Driver

Rapidly create and deploy powerful Java applications that integrate with HDFS.



Connect to HDFS in a Spring Boot Application using CData JDBC HDFS Driver

Spring Boot is a framework that makes engineering Java web applications easier. It offers the ability to create standalone applications with minimal configuration. When paired with the CData JDBC driver for HDFS, Spring Boot can work with live HDFS data. This article shows how to configure data sources and retrieve data in your Java Spring Boot Application, using the CData JDBC Driver for HDFS.

With built-in optimized data processing, the CData JDBC Driver offers unmatched performance for interacting with live HDFS data. When you issue complex SQL queries to HDFS, the driver pushes supported SQL operations, like filters and aggregations, directly to HDFS and utilizes the embedded SQL engine to process unsupported operations client-side (often SQL functions and JOIN operations). Its built-in dynamic metadata querying allows you to work with and analyze HDFS data using native data types.

Creating the Spring Boot Project in Java

In an IDE (in this tutorial, we use IntelliJ), choose a Maven project: In the generated project, go to the pom.xml file, and add the required dependencies for Spring Boot:

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.7.2</version> <relativePath/> </parent> <groupId>com.example</groupId> <artifactId>demo</artifactId> <version>0.0.1-SNAPSHOT</version> <name>demo</name> <description>Demo project for Spring Boot</description> <properties> <java.version>1.8</java.version> </properties> <build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-install-plugin</artifactId> <version>2.5.1</version> <executions> <execution> <id>id.install-file</id> <phase>clean</phase> <goals> <goal>install-file</goal> </goals> <configuration> <file>C:\Program Files\CData\CData JDBC Driver for HDFS ####\lib\cdata.jdbc.hdfs.jar</file> <groupId>org.cdata.connectors</groupId> <artifactId>cdata-hdfs-connector</artifactId> <version>21</version> <packaging>jar</packaging> </configuration> </execution> </executions> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jdbc</artifactId> <version>2.7.0</version> </dependency> <dependency> <groupId>org.cdata.connectors</groupId> <artifactId>cdata-hdfs-connector</artifactId> <version>21</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> <distributionManagement> <repository> <uniqueVersion>false</uniqueVersion> <id>test</id> <name>My Repository</name> <url>scp://repo/maven2</url> <layout>default</layout> </repository> </distributionManagement> </project>

Project Structure

In the java directory, create a new package. Usually the name of the package is the name of the groupId (com.example) followed by the artifactId (.MDS).

Make sure that the "java" directory is the sources root; this is indicated with a blue color. You may need to right click the java directory and select Mark Directory -> Sources Root. Also, the resources directory should be marked as Resources Root.

Database Connection Properties

In the application.properties file, we set the configuration properties for the HDFS JDBC Driver, using the Class name and JDBC URL:

spring.datasource.driver=cdata.jdbc.hdfs.HDFSDriver spring.datasource.url=jdbc:hdfs:Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;

Built-in Connection String Designer

For assistance in constructing the JDBC URL, use the connection string designer built into the HDFS JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

java -jar cdata.jdbc.hdfs.jar

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

After setting the properties in the application.properties file, we now configure them.

Data Source Configuration

First, we mark the HDFS data source as our primary data source. Then, we create a Data Source Bean.

Create a DriverManagerDataSource.java file and create a Bean within it, as shown below. If @Bean gives an error, Spring Boot may not have loaded properly. To fix this, go to File -> Invalidate Caches and restart. Additionally, make sure that Maven has added the Spring Boot dependencies.

To create a data source bean, we use DriverManagerDataSource Class. This class allows us to set the properties of the data source. The following code shows the bean definition of our data source. Each driver should have a bean.

import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.jdbc.DataSourceBuilder; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Primary; import org.springframework.core.env.Environment; import javax.sql.DataSource; public class DriverManagerDataSource{ @Autowired private static Environment env; @Bean(name ="HDFS") @Primary public static DataSource HDFSDataSource() { DataSourceBuilder<?> dataSourceBuilder = DataSourceBuilder.create(); dataSourceBuilder.driverClassName("cdata.jdbc.hdfs.HDFSDriver"); dataSourceBuilder.url("jdbc:hdfs:Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;"); return dataSourceBuilder.build(); } //@Override public void setEnvironment( final Environment environment) { env=environment; } }

Next, move the HDFS jar file to the Documents folder (see path in command below). Then, click the Maven icon (top right corner of IntelliJ) and click "Execute Maven Goal." Now, run the following command: mvn install:install-file -Dfile=C:\Program Files\CData\CData JDBC Driver for HDFS ####\lib\cdata.jdbc.hdfs.jar -DgroupId=cdata.jdbc.hdfs -DartifactId=cdata-hdfs-connector -Dversion=21 -Dpackaging=jar

After pressing enter, we see the following output:

Testing the Connection

The last step is testing the connection. We call the data source in the main method of MDSApplication.java:

import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration; import java.sql.Connection; import java.sql.SQLException; import static com.example.demo.DriverManagerDataSources.HDFSDataSource; @SpringBootApplication(exclude = {DataSourceAutoConfiguration.class}) public class MDSApplication { //remove the comment on the line below public static void main (){ SpringApplication.run(DemoApplication.class, args); Connection conn = HDFSDataSource().getConnection(); System.out.println("Catalog: "+ conn.getCatalog()); } }

The output generated should look like this:

Free Trial & More Information

Download a free, 30-day trial of the CData JDBC Driver for HDFS and start working with your live HDFS in Spring Boot