Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →Replicate Azure Data Lake Storage Data from PowerShell
Write a quick PowerShell script to query Azure Data Lake Storage data. Use connectivity to the live data to replicate Azure Data Lake Storage data to SQL Server.
The CData ODBC Driver for Azure Data Lake Storage enables out-of-the-box integration with Microsoft's built-in support for ODBC. The ODBC driver instantly integrates connectivity to the real Azure Data Lake Storage data with PowerShell.
You can use the .NET Framework Provider for ODBC built into PowerShell to quickly automate integration tasks like replicating Azure Data Lake Storage data to other databases. This article shows how to replicate Azure Data Lake Storage data to SQL Server in 5 lines of code.
You can also write PowerShell code to download Azure Data Lake Storage data. See the examples below.
Create an ODBC Data Source for Azure Data Lake Storage
If you have not already, first specify connection properties in an ODBC DSN (data source name). This is the last step of the driver installation. You can use the Microsoft ODBC Data Source Administrator to create and configure ODBC DSNs.
Authenticating to a Gen 1 DataLakeStore Account
Gen 1 uses OAuth 2.0 in Azure AD for authentication.
For this, an Active Directory web application is required. You can create one as follows:
To authenticate against a Gen 1 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen1.
- Account: Set this to the name of the account.
- OAuthClientId: Set this to the application Id of the app you created.
- OAuthClientSecret: Set this to the key generated for the app you created.
- TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Authenticating to a Gen 2 DataLakeStore Account
To authenticate against a Gen 2 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen2.
- Account: Set this to the name of the account.
- FileSystem: Set this to the file system which will be used for this account.
- AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Connect to Azure Data Lake Storage
The code below shows how to use the DSN to initialize the connection to Azure Data Lake Storage data in PowerShell:
$conn = New-Object System.Data.Odbc.OdbcConnection
$conn.ConnectionString = "DSN=CData ADLS Source x64"
Back Up Azure Data Lake Storage Data to SQL Server
After you enable caching, you can use the code below to replicate data to SQL Server.
Set the following connection properties to configure the caching database:
CacheProvider: The name of the ADO.NET provider. This can be found in the Machine.config for your version of .NET. For example, to configure SQL Server, enter System.Data.SqlClient.
CacheConnection: The connection string of properties required to connect to the database. Below is an example for SQL Server:
Server=localhost;Database=RSB;User Id=sqltest;Password=sqltest;
The SQL query in the example can be used to refresh the entire cached table, including its schema. Any already existing cache is deleted.
$conn.Open()
# Create and execute the SQL Query
$SQL = "CACHE DROP EXISTING SELECT * FROM " + $Resources
$cmd = New-Object System.Data.Odbc.OdbcCommand($sql,$conn)
$count = $cmd.ExecuteNonQuery()
$conn.Close()
The driver gives you complete control over the caching functionality. See the help documentation for more caching commands and usage examples. See the help documentation for steps to replicate to other databases.
Other Operations
To retrieve Azure Data Lake Storage data in PowerShell, call the Fill method of the OdbcDataAdapter method. To execute data manipulation commands, initialize the OdbcCommand object and then call ExecuteNonQuery. Below are some more examples commands to Azure Data Lake Storage through the .NET Framework Provider for ODBC:
Retrieve Azure Data Lake Storage Data
$sql="SELECT FullPath, Permission from Resources"
$da= New-Object System.Data.Odbc.OdbcDataAdapter($sql, $conn)
$dt= New-Object System.Data.DataTable
$da.Fill($dt)
$dt.Rows | foreach {
$dt.Columns | foreach ($col in dt{
Write-Host $1[$_]
}
}