Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →How to pipe HDFS Data to CSV in PowerShell
Use standard PowerShell cmdlets to access HDFS tables.
The CData Cmdlets Module for HDFS is a standard PowerShell module offering straightforward integration with HDFS. Below, you will find examples of using our HDFS Cmdlets with native PowerShell cmdlets.
Creating a Connection to Your HDFS Data
In order to authenticate, set the following connection properties:
- Host: Set this value to the host of your HDFS installation.
- Port: Set this value to the port of your HDFS installation. Default port: 50070
$conn = Connect-HDFS -Host "$Host" -Port "$Port" -Path "$Path" -User "$User"
Selecting Data
Follow the steps below to retrieve data from the Files table and pipe the result into to a CSV file:
Select-HDFS -Connection $conn -Table Files | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\myFilesData.csv -NoTypeInformation
You will notice that we piped the results from Select-HDFS into a Select-Object cmdlet and excluded some properties before piping them into an Export-Csv cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each "row" in the result set, and we do not necessarily want that information in our CSV file.
The Connection, Table, and Columns are appended to the results in order to facilitate piping results from one of the CData Cmdlets directly into another one.