Ready to get started?

Learn more or sign up for a free trial:

CData Connect Server

Import HDFS Data into Microsoft Power Query



The CData Connect Server offers standards-based Web service endpoints that allow a variety of applications to access HDFS data. In this article, you will use the OData format to import HDFS data into Microsoft Power Query.

The Connect Server enables you to use Web services to connect to and query HDFS data. This article details how to import an OData feed of HDFS data into Microsoft Power Query.

Connect to HDFS from Power Query

To work with live HDFS data in Microsoft Power Query, we need to connect to HDFS from Connect Server, provide user access to the new virtual database, and create OData endpoints for the HDFS data.

Add a Connect Server User

Create a User to connect to HDFS from Microsoft Power Query through Connect Server.

  1. Click Users -> Add
  2. Configure a User
  3. Click Save Changes and make note of the Authtoken for the new user

Connect to HDFS from Connect Server

CData Connect Server uses a straightforward, point-and-click interface to connect to data sources and generate APIs.

  1. Open Connect Server and click Connections
  2. Select "HDFS" from Available Data Sources
  3. Enter the necessary authentication properties to connect to HDFS.

    In order to authenticate, set the following connection properties:

    • Host: Set this value to the host of your HDFS installation.
    • Port: Set this value to the port of your HDFS installation. Default port: 50070
  4. Click Save Changes
  5. Click Privileges -> Add and add the new user (or an existing user) with the appropriate permissions (SELECT is all that is required for Reveal).

Add HDFS OData Endpoints in Connect Server

After connecting to HDFS, create OData Endpoints for the desired table(s).

  1. Click OData -> Tables -> Add Tables
  2. Select the HDFS database
  3. Select the table(s) you wish to work with and click Next
  4. (Optional) Edit the resource to select specific fields and more
  5. Save the settings

(Optional) Configure Cross-Origin Resource Sharing (CORS)

When accessing and connecting to multiple domains from an application such as Ajax, there is a possibility of violating the limitations of cross-site scripting. In that case, configure the CORS settings in OData -> Settings.

  • Enable cross-origin resource sharing (CORS): ON
  • Allow all domains without '*': ON
  • Access-Control-Allow-Methods: GET, PUT, POST, OPTIONS
  • Access-Control-Allow-Headers: Authorization

Save the changes to the settings.

Connect to HDFS Data from Power Query

Follow the steps below to import tables that can be refreshed on demand:

  1. Configure the Connect Server to use a version of the OData protocol that is recognized by Power Query. In the Connect Server administration console, click Settings -> Server and change the value of the Default Version property to 3.0.
  2. From the ribbon in Excel, click Power Query -> From Other Data Sources -> From OData Feed, and enter the OData URL:

    https://your-server:8032/api.rsc
  3. Next, define authentication credentials and set privacy levels. Select Basic authentication and enter the credentials for a user authorized to make requests. Specify the Username field and enter the user's authtoken in the Password field.

    To change the authentication scheme that Power Query will use, click Power Query -> Data Source Settings. Select the OData feed from the list and then click "Edit Permissions..." Select the privacy level from the menu.

  4. You can now access HDFS data in Power Query. In the Navigator expand the node for the OData feed, right-click a table, and click Edit to open the Query Editor. This will display the table data.

Free Trial & More Information

If you are interested in connecting to your HDFS data (or data from any of our other supported data sources) from Power Query, sign up for a free trial of CData Connect Server today! For more information on Connect Server and to see what other data sources we support, refer to our CData Connect page.