Import HDFS Data into the Power BI Service for Visualizations



Use CData Connect Server to create an OData feed for HDFS and create custom reports in the Power BI Service.

Power BI transforms your company's data into rich visuals for you to collect and organize so you can focus on what matters to you. When paired with CData Connect Server, you get access to HDFS data for visualizations, dashboards, and more. This article shows how to use the CData Connect Server to generate an OData feed for HDFS, import HDFS data into Power BI and then create reports on HDFS data in the Power BI service.

NOTE: You can also use the on-premise data gateway and the SQL interface in Connect Server to connect to HDFS data in real-time (instead of importing the data). Read how in the related Knowledge Base article.

Create a Virtual SQL Database for HDFS Data

CData Connect Server uses a straightforward, point-and-click interface to connect to data sources and generate APIs.

  1. Login to Connect Server and click Connections.
  2. Select "HDFS" from Available Data Sources.
  3. Enter the necessary authentication properties to connect to HDFS.

    In order to authenticate, set the following connection properties:

    • Host: Set this value to the host of your HDFS installation.
    • Port: Set this value to the port of your HDFS installation. Default port: 50070
  4. Click Save Changes
  5. Click Privileges -> Add and add the new user (or an existing user) with the appropriate permissions.

Connecting to Connect Server from Power BI

To import and visualize your HDFS data in the Power BI service, add a Connect Server API user, add HDFS OData endpoints in Connect Server, and create & publish a dataset from Power BI Desktop to the service.

Add a Connect Server User

Create a User to connect to HDFS from Power BI through Connect Server.

  1. Click Users -> Add
  2. Configure a User.
  3. Click Save Changes and make note of the Authtoken for the new user.
  4. Click Database and select the HDFS virtual database.
  5. On the Privileges tab, add the newly created user (with at least SELECT permissions) and click Save Changes.

Publish a Dataset from Power BI Desktop

With the HDFS connection configured in Connect Server, you can create a dataset in Power BI desktop using SQL Server connectivity and publish the dataset to the Power BI service.

  1. Open Power BI Desktop and click Get Data -> Other -> SQL Server and click "Connect"
  2. Set Server to the address and port of your CData Connect instance (localhost:8033 by default) and set Database to the name of the virtual database you just created (HDFS1)
  3. Use "Database" authentication, enter the credentials for a CData Connect user and click "Connect"
  4. Select tables in the Navigator dialog
  5. Click Load to import the data into Power BI
  6. Define any relationships between the selected entities on the Relationships tab.
  7. Click Publish (from the Home menu) and select a Workspace.

Build Reports and Dashboards on HDFS Data in the Power BI Service

Now that you have published a dataset to the Power BI service, you can create new reports and dashboards based on the published data:

  1. Log in to PowerBI.com.
  2. Click Workspaces and select a workspace.
  3. Click Create and select Report.
  4. Select the published dataset for the report.
  5. Choose fields and visualizations to add to your report.

SQL Access to HDFS Data from Applications

Now you have a direct connection to live HDFS data from the Power BI service. You can create more data sources and new visualizations, build reports, and more — all without replicating HDFS data.

To get SQL data access to 200+ SaaS, Big Data, and NoSQL sources directly from your applications, see the CData Connect Server.

Ready to get started?

Learn more or sign up for a free trial:

CData Connect Server