HDFS Reporting and Star Schemas in OBIEE



Create a star schema that shows perspectives on HDFS facts in real time.

The CData ODBC Driver for HDFS is a standard database driver that can integrate real-time access to HDFS data into your data warehouse or directly into your reporting tool. This article shows how to bypass the data warehouse and import operational HDFS data into Oracle Business Intelligence Enterprise Edition (OBIEE).

See the knowledge base for ODBC integrations with ETL tools like Informatica PowerCenter. For an ETL solution into Oracle Warehouse Builder, use the driver with the Oracle ODBC Gateway to Access HDFS Data as a Remote Oracle Database.

Connect to HDFS as an ODBC Data Source

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

When you configure the DSN, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

Import HDFS Metadata

Follow the steps below to use the OBIEE Client Tools to import HDFS metadata into an OBIEE repository. You can then integrate HDFS data into your business models.

  1. Open the Administration Tool and click File -> New Repository.

  2. In the Connection Type menu, select ODBC 3.5 and select the CData DSN.
  3. Select the metadata types you want to import under the Relational Sources option and then select HDFS tables.
You can now create star schemas based on HDFS tables:

Ready to get started?

Download a free trial of the HDFS ODBC Driver to get started:

 Download Now

Learn more:

HDFS Icon HDFS ODBC Driver

The HDFS ODBC Driver is a powerful tool that allows you to connect with live data from HDFS, directly from any applications that support ODBC connectivity.

Access HDFS data like you would a database - read, write, and update HDFS HDFSData, etc. through a standard ODBC Driver interface.