Connect to Azure Data Lake Storage Data in Google Apps Script

Ready to get started?

Download for a free trial:

Download Now

Learn more:

Azure Data Lake Storage ODBC Driver

The Azure Data Lake Storage ODBC Driver is a powerful tool that allows you to connect with live data from Azure Data Lake Storage, directly from any applications that support ODBC connectivity.

Access Azure Data Lake Storage data like you would a database - read, write, and update Azure Data Lake Storage ADLSData, etc. through a standard ODBC Driver interface.



Use the ODBC Driver for Azure Data Lake Storage and the SQL Gateway to access Azure Data Lake Storage data from Google Apps Script.

Google Apps Script gives you the ability to create custom functionality within your Google documents, including Google Sheets, Google Docs, and more. With the CData SQL Gateway, you can create a MySQL interface for any ODBC driver, including the 200+ drivers by CData for sources like Azure Data Lake Storage. The MySQL protocol is natively supported through the JDBC service in Google Apps Script, so by utilizing the SQL Gateway, you gain access to live Azure Data Lake Storage data within your Google documents.

This article discusses connecting to the ODBC Driver for Azure Data Lake Storage from Google Apps Script, walking through the process of configuring the SQL Gateway and providing sample scripting for processing Azure Data Lake Storage data in a Google Spreadsheet.

Real-Time Connectivity Through SQL Gateway

With SQL Gateway, your local ODBC data sources can look and behave like a standard MySQL database. Simply create a new MySQL remoting service in the SQL Gateway for the ODBC Driver for Azure Data Lake Storage and ensure that the SQL Gateway is installed on a web-facing machine (or can connect to a hosted SSH server).

Connect to Azure Data Lake Storage Data

If you have not already done so, provide values for the required connection properties in the data source name (DSN). You can use the built-in Microsoft ODBC Data Source Administrator to configure the DSN. This is also the last step of the driver installation. See the "Getting Started" chapter in the help documentation for a guide to using the Microsoft ODBC Data Source Administrator to create and configure a DSN.

Authenticating to a Gen 1 DataLakeStore Account

Gen 1 uses OAuth 2.0 in Azure AD for authentication.

For this, an Active Directory web application is required. You can create one as follows:

  1. Sign in to your Azure Account through the .
  2. Select "Azure Active Directory".
  3. Select "App registrations".
  4. Select "New application registration".
  5. Provide a name and URL for the application. Select Web app for the type of application you want to create.
  6. Select "Required permissions" and change the required permissions for this app. At a minimum, "Azure Data Lake" and "Windows Azure Service Management API" are required.
  7. Select "Key" and generate a new key. Add a description, a duration, and take note of the generated key. You won't be able to see it again.

To authenticate against a Gen 1 DataLakeStore account, the following properties are required:

  • Schema: Set this to ADLSGen1.
  • Account: Set this to the name of the account.
  • OAuthClientId: Set this to the application Id of the app you created.
  • OAuthClientSecret: Set this to the key generated for the app you created.
  • TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
  • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.

Authenticating to a Gen 2 DataLakeStore Account

To authenticate against a Gen 2 DataLakeStore account, the following properties are required:

  • Schema: Set this to ADLSGen2.
  • Account: Set this to the name of the account.
  • FileSystem: Set this to the file system which will be used for this account.
  • AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
  • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.

Create a MySQL Remoting Service for Azure Data Lake Storage Data

See the SQL Gateway Overview to set up connectivity to Azure Data Lake Storage data as a virtual MySQL database. You will configure a MySQL remoting service that listens for MySQL requests from clients. The service can be configured in the SQL Gateway UI.

Connecting to Azure Data Lake Storage Data with Apps Script

At this point, you should have configured the SQL Gateway for Azure Data Lake Storage data. All that is left now is to use Google Apps Script to access the MySQL remoting service and work with your Azure Data Lake Storage data in Google Sheets.

In this section, you will create a script (with a menu option to call the script) to populate a spreadsheet with Azure Data Lake Storage data. We have created a sample script and explained the different parts. You can view the raw script at the end of the article.

1. Create an Empty Script

To create a script for your Google Sheet, click Tools Script editor from the Google Sheets menu:

2. Declare Class Variables

Create a handful of class variables to be available for any functions created in the script.

//replace the variables in this block with real values as needed
var address = 'my.server.address:port';
var user = 'SQL_GATEWAY_USER';
var userPwd = 'SQL_GATEWAY_PASSWORD';
var db = 'CData ADLS Sys';

var dbUrl = 'jdbc:mysql://' + address + '/' + db;

3. Add a Menu Option

This function adds a menu option to your Google Sheet, allowing you to use the UI to call your function.

function onOpen() {
  var spreadsheet = SpreadsheetApp.getActive();
  var menuItems = [
    {name: 'Write data to a sheet', functionName: 'connectToADLSData'}
  ];
  spreadsheet.addMenu('Azure Data Lake Storage Data', menuItems);
} 

4. Write a Helper Function

This function is used to find the first empty row in a spreadsheet.

/*
 * Finds the first empty row in a spreadsheet by scanning an array of columns
 * @return The row number of the first empty row.
 */
function getFirstEmptyRowByColumnArray(spreadSheet, column) {
  var column = spreadSheet.getRange(column + ":" + column);
  var values = column.getValues(); // get all data in one call
  var ct = 0;
  while ( values[ct] && values[ct][0] != "" ) {
    ct++;
  }
  return (ct+1);
}

5. Write a Function to Write Azure Data Lake Storage Data to a Spreadsheet

The function below writes the Azure Data Lake Storage data, using the Google Apps Script JDBC functionality to connect to the MySQL remoting service, SELECT data, and populate a spreadsheet. When the script is run, two input boxes will appear:

The first one asking the user to input the name of a sheet to hold the data (if the spreadsheet does not exist, the function creates it)

and the second one asking the user to input the name of a Azure Data Lake Storage table to read. If an invalid table is chosen, an error message appears and the function is exited.

It is worth noting that, while the function is designed to be used as a menu option, it could be extended for use as a formula in a spreadsheet.

/*
 * Reads data from a specified Azure Data Lake Storage 'table' and writes it to the specified sheet.
 *    (If the specified sheet does not exist, it is created.)
 */
function connectToADLSData() {
  var thisWorkbook = SpreadsheetApp.getActive();

  //select a sheet and create it if it does not exist
  var selectedSheet = Browser.inputBox('Which sheet would you like the data to post to?',Browser.Buttons.OK_CANCEL);
  if (selectedSheet == 'cancel')
    return;

  if (thisWorkbook.getSheetByName(selectedSheet) == null)
    thisWorkbook.insertSheet(selectedSheet);
  var resultSheet = thisWorkbook.getSheetByName(selectedSheet);
  var rowNum = 2;

  //select a Azure Data Lake Storage 'table'
  var table = Browser.inputBox('Which table would you like to pull data from?',Browser.Buttons.OK_CANCEL);
  if (table == 'cancel')
    return;

  var conn = Jdbc.getConnection(dbUrl, user, userPwd);

  //confirm that var table is a valid table/view
  var dbMetaData = conn.getMetaData();
  var tableSet = dbMetaData.getTables(null, null, table, null);
  var validTable = false;
  while (tableSet.next()) {
    var tempTable = tableSet.getString(3);
    if (table.toUpperCase() == tempTable.toUpperCase()){
      table = tempTable;
      validTable = true;
      break;
    }
  } 
  tableSet.close();
  if (!validTable) {
    Browser.msgBox("Invalid table name: " + table, Browser.Buttons.OK);
    return;
  }

  var stmt = conn.createStatement();

  var results = stmt.executeQuery('SELECT * FROM ' + table);
  var rsmd = results.getMetaData();
  var numCols = rsmd.getColumnCount();

  //if the sheet is empty, populate the first row with the headers
  var firstEmptyRow = getFirstEmptyRowByColumnArray(resultSheet, "A");
  if (firstEmptyRow == 1) {
    //collect column names
    var headers = new Array(new Array(numCols));
    for (var col = 0; col < numCols; col++){
      headers[0][col] = rsmd.getColumnName(col+1);
    }
    resultSheet.getRange(1, 1, headers.length, headers[0].length).setValues(headers);
  } else {
    rowNum = firstEmptyRow;
  }

  //write rows of Azure Data Lake Storage data to the sheet
  var values = new Array(new Array(numCols));
  while (results.next()) {
    for (var col = 0; col < numCols; col++) {
      values[0][col] = results.getString(col + 1);
    }
    resultSheet.getRange(rowNum, 1, 1, numCols).setValues(values);
    rowNum++;
  }

  results.close();
  stmt.close();
}
  

When the function is completed, you have a spreadsheet populated with your Azure Data Lake Storage data and you can now leverage all of the calculating, graphing, and charting functionality of Google Sheets anywhere you have access to the Internet.


Complete Google Apps Script

//replace the variables in this block with real values as needed
var address = 'my.server.address:port';
var user = 'SQL_GATEWAY_USER';
var userPwd = 'SQL_GATEWAY_PASSWORD';
var db = 'CData ADLS Sys';

var dbUrl = 'jdbc:mysql://' + address + '/' + db;

function onOpen() {
  var spreadsheet = SpreadsheetApp.getActive();
  var menuItems = [
    {name: 'Write table data to a sheet', functionName: 'connectToADLSData'}
  ];
  spreadsheet.addMenu('Azure Data Lake Storage Data', menuItems);
}

/*
 * Finds the first empty row in a spreadsheet by scanning an array of columns
 * @return The row number of the first empty row.
 */
function getFirstEmptyRowByColumnArray(spreadSheet, column) {
  var column = spreadSheet.getRange(column + ":" + column);
  var values = column.getValues(); // get all data in one call
  var ct = 0;
  while ( values[ct] && values[ct][0] != "" ) {
    ct++;
  }
  return (ct+1);
}

/*
 * Reads data from a specified 'table' and writes it to the specified sheet.
 *    (If the specified sheet does not exist, it is created.)
 */
function connectToADLSData() {
  var thisWorkbook = SpreadsheetApp.getActive();

  //select a sheet and create it if it does not exist
  var selectedSheet = Browser.inputBox('Which sheet would you like the data to post to?',Browser.Buttons.OK_CANCEL);
  if (selectedSheet == 'cancel')
    return;

  if (thisWorkbook.getSheetByName(selectedSheet) == null)
    thisWorkbook.insertSheet(selectedSheet);
  var resultSheet = thisWorkbook.getSheetByName(selectedSheet);
  var rowNum = 2;

  //select a Azure Data Lake Storage 'table'
  var table = Browser.inputBox('Which table would you like to pull data from?',Browser.Buttons.OK_CANCEL);
  if (table == 'cancel')
    return;

  var conn = Jdbc.getConnection(dbUrl, user, userPwd);

  //confirm that var table is a valid table/view
  var dbMetaData = conn.getMetaData();
  var tableSet = dbMetaData.getTables(null, null, table, null);
  var validTable = false;
  while (tableSet.next()) {
    var tempTable = tableSet.getString(3);
    if (table.toUpperCase() == tempTable.toUpperCase()){
      table = tempTable;
      validTable = true;
      break;
    }
  } 
  tableSet.close();
  if (!validTable) {
    Browser.msgBox("Invalid table name: " + table, Browser.Buttons.OK);
    return;
  }

  var stmt = conn.createStatement();

  var results = stmt.executeQuery('SELECT * FROM ' + table);
  var rsmd = results.getMetaData();
  var numCols = rsmd.getColumnCount();

  //if the sheet is empty, populate the first row with the headers
  var firstEmptyRow = getFirstEmptyRowByColumnArray(resultSheet, "A");
  if (firstEmptyRow == 1) {
    //collect column names
    var headers = new Array(new Array(numCols));
    for (var col = 0; col < numCols; col++){
      headers[0][col] = rsmd.getColumnName(col+1);
    }
    resultSheet.getRange(1, 1, headers.length, headers[0].length).setValues(headers);
  } else {
    rowNum = firstEmptyRow;
  }

  //write rows of Azure Data Lake Storage data to the sheet
  var values = new Array(new Array(numCols));
  while (results.next()) {
    for (var col = 0; col < numCols; col++) {
      values[0][col] = results.getString(col + 1);
    }
    resultSheet.getRange(rowNum, 1, 1, numCols).setValues(values);
    rowNum++;
  }

  results.close();
  stmt.close();
}