Create Datasets from HDFS in Domo Workbench and Build Visualizations of HDFS Data in Domo

Ready to get started?

Download for a free trial:

Download Now

Learn more:

HDFS ODBC Driver

The HDFS ODBC Driver is a powerful tool that allows you to connect with live data from HDFS, directly from any applications that support ODBC connectivity.

Access HDFS data like you would a database - read, write, and update HDFS HDFSData, etc. through a standard ODBC Driver interface.



Use the CData ODBC Driver for HDFS to create datasets from HDFS data in Domo Workbench and then build visualizations in the Domo service.

Domo helps you manage, analyze, and share data across your entire organization, enabling decision makers to identify and act on strategic opportunities. Domo Workbench provides a secure, client-side solution for uploading your on-premise data to Domo. The CData ODBC Driver for HDFS links Domo Workbench to operational HDFS data. You can build datasets from HDFS data using standard SQL queries in Workbench and then create real-time visualizations of HDFS data in the Domo service.

The CData ODBC Drivers offer unmatched performance for interacting with live HDFS data in Domo due to optimized data processing built into the driver. When you issue complex SQL queries from Domo to HDFS, the driver pushes supported SQL operations, like filters and aggregations, directly to HDFS and utilizes the embedded SQL Engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can visualize and analyze HDFS data using native Domo data types.

Connect to HDFS as an ODBC Data Source

If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

When you configure the DSN, you may also want to set the Max Rows connection property. This will limit the number of rows returned, which is especially helpful for improving performance when designing reports and visualizations.

After creating a DSN, you will need to create a dataset for HDFS in Domo Workbench using the HDFS DSN and build a visualization in the Domo service based on the dataset.

Build a Dataset for HDFS Data

You can follow the steps below to build a dataset based on a table in HDFS in Domo Workbench using the CData ODBC Driver for HDFS.

  1. Open Domo Workbench and, if you have not already, add your Domo service server to Workbench. In the Accounts submenu, click Add New, type in the server address (i.e., domain.domo.com) and click through the wizard to authenticate.
  2. In the DataSet Jobs submenu, click Add New.
  3. Name the dataset job (i.e., ODBC HDFS Files), select ODBC Connection Provider as the transport method, and click through the wizard.
  4. In the newly created DataSet Job, navigate to Source and click to configure the settings.
  5. Select System DSN for the Connection Type.
  6. Select the previously configured DSN (CData HDFS Sys) for the System DSN.
  7. Click to validate the configuration.
  8. Below the settings, set the Query to a SQL query: SELECT * FROM Files NOTE: By connecting to HDFS data using an ODBC driver, you simply need to know SQL in order to get your data, circumventing the need to know HDFS-specific APIs or protocols.
  9. Click preview.
  10. Check over the generated schema, add any transformations, then save and run the dataset job.

With the dataset job run, the dataset will be accessible from the Domo service, allowing you to build visualizations, reports, and more based on HDFS data.

Create Data Visualizations

With the DataSet Job saved and run in Domo Workbench, we are ready to build visualizations of the HDFS data in the Domo service.

  1. Navigate to the Data Center.
  2. In the data warehouse, select the ODBC data source and drill down to our new dataset.
  3. With the dataset selected, choose to create a visualization.
  4. In the new card:
    • Drag a Dimension to the X Value.
    • Drag a Measure to the Y Value.
    • Choose a Visualization.

With the CData ODBC Driver for HDFS, you can build custom datasets based on HDFS data using only SQL in Domo Workbench and then build and share visualizations and reports through the Domo service.