Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →Use the CData ODBC Driver for HDFS in Microsoft Power Query
You can use the CData HDFS ODBC Driver with Microsoft Power Query. In this article, you will use the ODBC driver to import HDFS data into Microsoft Power Query.
The CData ODBC Driver for HDFS enables you to link to HDFS data in Microsoft Power Query, ensuring that you see any updates. This article details how to use the ODBC driver to import HDFS data into Microsoft Power Query.
Connect to HDFS as an ODBC Data Source
If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.
In order to authenticate, set the following connection properties:
- Host: Set this value to the host of your HDFS installation.
- Port: Set this value to the port of your HDFS installation. Default port: 50070
Import HDFS Data
Follow the steps below to import HDFS data using standard SQL:
-
From the ribbon in Excel, click Power Query -> From Other Data Sources -> From ODBC.
- Enter the ODBC connection string. Below is a connection string using the default DSN created when you install the driver:
Provider=MSDASQL.1;Persist Security Info=False;DSN=CData HDFS Source
-
Enter the SELECT statement to import data with. For example:
SELECT FileId, ChildrenNum FROM Files WHERE FileId = '119116'
Enter credentials, if required, and click Connect. The results of the query are displayed in the Query Editor Preview. You can combine queries from other data sources or refine the data with Power Query formulas. To load the query to the worksheet, click the Close and Load button.