Build Charts with HDFS Data in Clear Analytics



Create dynamic charts and perform analytics based on HDFS data in Clear Analytics.

The CData ODBC Driver for HDFS enables access to live data from HDFS under the ODBC standard, allowing you work with HDFS data in a wide variety of BI, reporting, and ETL tools and directly, using familiar SQL queries. This article shows how to use Clear Analytics, a Microsoft Excel Add-In, to connect to HDFS as an ODBC source and create queries, tables, and charts (including PivotTables) based on HDFS data.

Connect to HDFS Data


Configure the ODBC Data Source Name

If you have not already done so, provide values for the required connection properties in the data source name (DSN). You can use the built-in Microsoft ODBC Data Source Administrator to configure the DSN. This is also the last step of the driver installation. See the "Getting Started" chapter in the help documentation for a guide to using the Microsoft ODBC Data Source Administrator to create and configure a DSN.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

When you configure the DSN, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

Configure the Data Source in Clear Analytics

  1. Open Excel and navigate to the CLEAR ANALYTICS ribbon. Once there, open the Data Manager.
  2. Select Database as the data source.
  3. In the Set Connection section, click the option to create a new database.
  4. Select Microsoft ODBC Data Source as the data source and click OK.
  5. Select the DSN you already configured from the drop-down menu.
  6. Back on the Set Connection section, select Standard (ANSI ODBC) Query Builder as the SQL Builder Provider and click Next.
  7. Select the Schema/Owner and choose the domains (tables) that you wish to use in Clear Analytics.
  8. Prepare your data objects as needed by customizing the display names and descriptions of the tables and columns.
  9. For the vast majority of the CData ODBC Drivers, you will not set a key date for your domains.
  10. In the Domain Relations section, add any relational information between tables.
  11. In the Domain Tree section, create groups for your data and add the available items to the groups.
  12. Review the summary of your data and click Finish.

Create a Chart with HDFS Data

You are now ready to create a chart with HDFS data.

Create a New Query

  1. Click Repository in the CLEAR ANALYTICS ribbon.
  2. Create a new query.
  3. Select the columns you wish to retrieve.
  4. Set the aggregation type for your data (use the blank entry if you do not wish to aggregate the data).
  5. Set filters and formulas by dragging columns to the lower window.
  6. Name your query and click Save.

Build a Chart Based on a Query Report

With the query created, you are now ready to execute a report and display a chart.
  1. Click Report Explorer in the CLEAR ANALYTICS ribbon.
  2. In the Report Explorer pane, click the 'New Report' icon in the toolbar.
  3. Select the query you just created.
  4. Name the report and click 'Save and Execute'.
  5. Click the Results tab within the Report Explorer
  6. Expand your report and drag the chart to the Excel spreadsheet.
  7. In the resulting PivotChart window, drag the fields (columns) to the Filters, Legends, Axis (Categories), and Values windows.

With a new data source in Clear Analytics established and a chart created, you are ready to begin analysis of HDFS data. With the ODBC Driver for HDFS and Clear Analytics, you can perform self-service analytics in Excel with live data, directly from HDFS.

Ready to get started?

Download a free trial of the HDFS ODBC Driver to get started:

 Download Now

Learn more:

HDFS Icon HDFS ODBC Driver

The HDFS ODBC Driver is a powerful tool that allows you to connect with live data from HDFS, directly from any applications that support ODBC connectivity.

Access HDFS data like you would a database - read, write, and update HDFS HDFSData, etc. through a standard ODBC Driver interface.