Ready to get started?

Learn more about the CData ADO.NET Provider for HDFS or download a free trial:

Download Now

Automate HDFS Integration Tasks from PowerShell

Are you looking for a quick and easy way to access HDFS data from PowerShell? We show how to use the Cmdlets for HDFS and the CData ADO.NET Provider for HDFS to connect to HDFS data and synchronize, automate, download, and more.

The CData Cmdlets for HDFS are standard PowerShell cmdlets that make it easy to accomplish data cleansing, normalization, backup, and other integration tasks by enabling real-time access to HDFS.

Cmdlets or ADO.NET?

The cmdlets are not only a PowerShell interface to the HDFS API, but also an SQL interface; this tutorial shows how to use both to retrieve HDFS data. We also show examples of the ADO.NET equivalent, which is possible with the CData ADO.NET Provider for HDFS. To access HDFS data from other .NET applications, like LINQPad, use the CData ADO.NET Provider for HDFS.

After obtaining the needed connection properties, accessing HDFS data in PowerShell consists of three basic steps.

In order to authenticate, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070

PowerShell

  1. Install the module:

    Install-Module HDFSCmdlets
  2. Connect:

    $hdfs = Connect-HDFS -Host "$Host" -Port "$Port" -Path "$Path" -User "$User"
  3. Search for and retrieve data:

    $fileid = "119116" $files = Select-HDFS -Connection $hdfs -Table "Files" -Where "FileId = `'$FileId`'" $files

    You can also use the Invoke-HDFS cmdlet to execute SQL commands:

    $files = Invoke-HDFS -Connection $hdfs -Query 'SELECT * FROM Files WHERE FileId = @FileId' -Params @{'@FileId'='119116'}

ADO.NET

  1. Load the provider's assembly:

    [Reflection.Assembly]::LoadFile("C:\Program Files\CData\CData ADO.NET Provider for HDFS\lib\System.Data.CData.HDFS.dll")
  2. Connect to HDFS:

    $conn= New-Object System.Data.CData.HDFS.HDFSConnection("Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;") $conn.Open()
  3. Instantiate the HDFSDataAdapter, execute an SQL query, and output the results:

    $sql="SELECT FileId, ChildrenNum from Files" $da= New-Object System.Data.CData.HDFS.HDFSDataAdapter($sql, $conn) $dt= New-Object System.Data.DataTable $da.Fill($dt) $dt.Rows | foreach { Write-Host $_.fileid $_.childrennum }