How to pipe HDFS Data to CSV in PowerShell



Use standard PowerShell cmdlets to access HDFS tables.

The CData Cmdlets Module for HDFS is a standard PowerShell module offering straightforward integration with HDFS. Below, you will find examples of using our HDFS Cmdlets with native PowerShell cmdlets.

Creating a Connection to Your HDFS Data

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

$conn = Connect-HDFS  -Host "$Host" -Port "$Port" -Path "$Path" -User "$User"

Selecting Data

Follow the steps below to retrieve data from the Files table and pipe the result into to a CSV file:

Select-HDFS -Connection $conn -Table Files | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\myFilesData.csv -NoTypeInformation

You will notice that we piped the results from Select-HDFS into a Select-Object cmdlet and excluded some properties before piping them into an Export-Csv cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each "row" in the result set, and we do not necessarily want that information in our CSV file.

The Connection, Table, and Columns are appended to the results in order to facilitate piping results from one of the CData Cmdlets directly into another one.

Ready to get started?

Download a free trial of the HDFS Cmdlets to get started:

 Download Now

Learn more:

HDFS Icon HDFS Data Cmdlets

An easy-to-use set of PowerShell Cmdlets offering real-time access to HDFS. The Cmdlets allow users to easily read, write, update, and delete live data - just like working with SQL server.