How to work with Short.io Data in Apache Spark using SQL

Jerod Johnson
Jerod Johnson
Director, Technology Evangelism
Access and process Short.io Data in Apache Spark using the CData JDBC Driver.

Apache Spark is a fast and general engine for large-scale data processing. When paired with the CData JDBC Driver for Short.io, Spark can work with live Short.io data. This article describes how to connect to and query Short.io data from a Spark shell.

The CData JDBC Driver offers unmatched performance for interacting with live Short.io data due to optimized data processing built into the driver. When you issue complex SQL queries to Short.io, the driver pushes supported SQL operations, like filters and aggregations, directly to Short.io and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can work with and analyze Short.io data using native data types.

Install the CData JDBC Driver for Short.io

Download the CData JDBC Driver for Short.io installer, unzip the package, and run the JAR file to install the driver.

Start a Spark Shell and Connect to Short.io Data

  1. Open a terminal and start the Spark shell with the CData JDBC Driver for Short.io JAR file as the jars parameter:
    $ spark-shell --jars /CData/CData JDBC Driver for Short.io/lib/cdata.jdbc.api.jar
    
  2. With the shell running, you can connect to Short.io with a JDBC URL and use the SQL Context load() function to read a table.

    Using API Key Authentication

    Short.io uses API Key authentication. To obtain your API key:

    1. Log in to your Short.io account
    2. Navigate to Settings > Integrations & API > API
    3. Click Create API Key and copy your API key

    After obtaining the API key, you are ready to connect:

    • AuthScheme: Set this to APIKey.
    • APIKey: Set this to your Short.io API key obtained from Settings > Integrations & API > API.

    Example connection string:

    Profile=C:\profiles\ShortIo.apip;AuthScheme=APIKey;ProfileSettings='APIKey=your_api_key';
    

    Available Tables

    The Short.io profile provides access to the following tables:

    • Domains - Short.io domains associated with the authenticated account
    • Links - Short links for a domain
    • LinkExpand - Expand a short link by domain and path
    • LinksByOriginalUrl - Retrieve multiple short links matching a given original destination URL
    • Folders - Link folders within a specific domain
    • LinkPermissions - Permission records for a specific link within a domain
    • CountryTargeting - Country-based redirect targeting rules for a specific short link
    • RegionTargeting - Region-based redirect targeting rules for a specific short link
    • Regions - List of available regions/states for a given country code
    • DomainStatistics - Aggregated click and traffic statistics for a Short.io domain
    • LinkStatistics - Aggregated click and traffic statistics for a specific Short.io link

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the Short.io JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

    java -jar cdata.jdbc.api.jar
    

    Fill in the connection properties and copy the connection string to the clipboard.

    Configure the connection to Short.io, using the connection string generated above.

    scala> val api_df = spark.sqlContext.read.format("jdbc").option("url", "jdbc:api:Profile=C:\profiles\ShortIo.apip;AuthScheme=APIKey;ProfileSettings='APIKey=your_api_key';").option("dbtable","Domains").option("driver","cdata.jdbc.api.APIDriver").load()
    
  3. Once you connect and the data is loaded you will see the table schema displayed.
  4. Register the Short.io data as a temporary table:

    scala> api_df.registerTable("domains")
  5. Perform custom SQL queries against the Data using commands like the one below:

    scala> api_df.sqlContext.sql("SELECT ,  FROM Domains WHERE  = ").collect.foreach(println)

    You will see the results displayed in the console, similar to the following:

Using the CData JDBC Driver for Short.io in Apache Spark, you are able to perform fast and complex analytics on Short.io data, combining the power and utility of Spark with your data. Download a free, 30 day trial of any of the hundreds of CData JDBC Drivers and get started today.

Ready to get started?

Connect to live data from Short.io with the API Driver

Connect to Short.io