Are you looking for a quick and easy way to access Azure Data Lake Storage data from PowerShell? We show how to use the Cmdlets for Azure Data Lake Storage and the CData ADO.NET Provider for Azure Data Lake Storage to connect to Azure Data Lake Storage data and synchronize, automate, download, and more.
The CData Cmdlets for Azure Data Lake Storage are standard PowerShell cmdlets that make it easy to accomplish data cleansing, normalization, backup, and other integration tasks by enabling real-time access to Azure Data Lake Storage.
Cmdlets or ADO.NET?
The cmdlets are not only a PowerShell interface to the Azure Data Lake Storage API, but also an SQL interface; this tutorial shows how to use both to retrieve Azure Data Lake Storage data. We also show examples of the ADO.NET equivalent, which is possible with the CData ADO.NET Provider for Azure Data Lake Storage. To access Azure Data Lake Storage data from other .NET applications, like LINQPad, use the CData ADO.NET Provider for Azure Data Lake Storage.
After obtaining the needed connection properties, accessing Azure Data Lake Storage data in PowerShell consists of three basic steps.
Authenticating to a Gen 1 DataLakeStore Account
Gen 1 uses OAuth 2.0 in Azure AD for authentication.
For this, an Active Directory web application is required. You can create one as follows:
To authenticate against a Gen 1 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen1.
- Account: Set this to the name of the account.
- OAuthClientId: Set this to the application Id of the app you created.
- OAuthClientSecret: Set this to the key generated for the app you created.
- TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Authenticating to a Gen 2 DataLakeStore Account
To authenticate against a Gen 2 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen2.
- Account: Set this to the name of the account.
- FileSystem: Set this to the file system which will be used for this account.
- AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
PowerShell
-
Install the module:
Install-Module ADLSCmdlets
-
Connect:
$adls = Connect-ADLS -Schema "$Schema" -Account "$Account" -FileSystem "$FileSystem" -AccessKey "$AccessKey"
-
Search for and retrieve data:
$type = "FILE" $resources = Select-ADLS -Connection $adls -Table "Resources" -Where "Type = `'$Type`'" $resources
You can also use the Invoke-ADLS cmdlet to execute SQL commands:
$resources = Invoke-ADLS -Connection $adls -Query 'SELECT * FROM Resources WHERE Type = @Type' -Params @{'@Type'='FILE'}
ADO.NET
-
Load the provider's assembly:
[Reflection.Assembly]::LoadFile("C:\Program Files\CData\CData ADO.NET Provider for Azure Data Lake Storage\lib\System.Data.CData.ADLS.dll")
-
Connect to Azure Data Lake Storage:
$conn= New-Object System.Data.CData.ADLS.ADLSConnection("Schema=ADLSGen2;Account=myAccount;FileSystem=myFileSystem;AccessKey=myAccessKey;InitiateOAuth=GETANDREFRESH") $conn.Open()
-
Instantiate the ADLSDataAdapter, execute an SQL query, and output the results:
$sql="SELECT FullPath, Permission from Resources" $da= New-Object System.Data.CData.ADLS.ADLSDataAdapter($sql, $conn) $dt= New-Object System.Data.DataTable $da.Fill($dt) $dt.Rows | foreach { Write-Host $_.fullpath $_.permission }