We are proud to share our inclusion in the 2024 Gartner Magic Quadrant for Data Integration Tools. We believe this recognition reflects the differentiated business outcomes CData delivers to our customers.
Get the Report →Using the CData ODBC Driver for Azure Data Lake Storage in PyCharm
Connect to Azure Data Lake Storage as an ODBC data source in PyCharm using the CData ODBC Driver for Azure Data Lake Storage.
The CData ODBC Drivers can be used in any environment that supports loading an ODBC Driver. In this tutorial we will explore using the CData ODBC Driver for Azure Data Lake Storage from within PyCharm. Included are steps for adding the CData ODBC Driver as a data source, as well as basic PyCharm code to query the data source and display results.
To begin, this tutorial will assume that you have already installed the CData ODBC Driver for Azure Data Lake Storage as well as PyCharm.
Add Pyodbc to the Project
Follow the steps below to add the pyodbc module to your project.
- Click File -> Settings to open the project settings window.
- Click Project Interpreter from the Project: YourProjectName menu.
- To add pyodbc, click the + button and enter pyodbc.
- Click Install Package to install pyodbc.
Connect to Azure Data Lake Storage
You can now connect with an ODBC connection string or a DSN. See the Getting Started section in the CData driver documentation for a guide to creating a DSN on your OS.
Authenticating to a Gen 1 DataLakeStore Account
Gen 1 uses OAuth 2.0 in Azure AD for authentication.
For this, an Active Directory web application is required. You can create one as follows:
To authenticate against a Gen 1 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen1.
- Account: Set this to the name of the account.
- OAuthClientId: Set this to the application Id of the app you created.
- OAuthClientSecret: Set this to the key generated for the app you created.
- TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Authenticating to a Gen 2 DataLakeStore Account
To authenticate against a Gen 2 DataLakeStore account, the following properties are required:
- Schema: Set this to ADLSGen2.
- Account: Set this to the name of the account.
- FileSystem: Set this to the file system which will be used for this account.
- AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
- Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
Below is the syntax for a DSN:
[CData ADLS Source]
Driver = CData ODBC Driver for Azure Data Lake Storage
Description = My Description
Schema = ADLSGen2
Account = myAccount
FileSystem = myFileSystem
AccessKey = myAccessKey
Execute SQL to Azure Data Lake Storage
Instantiate a Cursor and use the execute method of the Cursor class to execute any SQL statement.
import pyodbc
cnxn = pyodbc.connect('DRIVER={CData ODBC Driver for ADLS};Schema = ADLSGen2;Account = myAccount;FileSystem = myFileSystem;AccessKey = myAccessKey;')
cursor = cnxn.cursor()
cursor.execute("SELECT FullPath, Permission FROM Resources WHERE Type = 'FILE'")
rows = cursor.fetchall()
for row in rows:
print(row.FullPath, row.Permission)
After connecting to Azure Data Lake Storage in PyCharm using the CData ODBC Driver, you will be able to build Python apps with access to Azure Data Lake Storage data as if it were a standard database. If you have any questions, comments, or feedback regarding this tutorial, please contact us at [email protected].