Ready to get started?

Learn more about the CData JDBC Driver for UPS or download a free trial:

Download Now

Work with UPS Data in Apache Spark Using SQL

Access and process UPS Data in Apache Spark using the CData JDBC Driver.

Apache Spark is a fast and general engine for large-scale data processing. When paired with the CData JDBC Driver for UPS, Spark can work with live UPS data. This article describes how to connect to and query UPS data from a Spark shell.

The CData JDBC Driver offers unmatched performance for interacting with live UPS data due to optimized data processing built into the driver. When you issue complex SQL queries to UPS, the driver pushes supported SQL operations, like filters and aggregations, directly to UPS and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can work with and analyze UPS data using native data types.

Install the CData JDBC Driver for UPS

Download the CData JDBC Driver for UPS installer, unzip the package, and run the JAR file to install the driver.

Start a Spark Shell and Connect to UPS Data

  1. Open a terminal and start the Spark shell with the CData JDBC Driver for UPS JAR file as the jars parameter: $ spark-shell --jars /CData/CData JDBC Driver for UPS/lib/cdata.jdbc.ups.jar
  2. With the shell running, you can connect to UPS with a JDBC URL and use the SQL Context load() function to read a table.

    The driver uses five pieces of information in order to authenticate its actions with the UPS service.

    • Server: This controls the URL where the requests should be sent. Common testing options for this are: https://wwwcie.ups.com/ups.app/xml and https://wwwcie.ups.com/webservices
    • AccessKey: This is an identifier that is required to connect to a UPS Server. This value will be provided to you by UPS after registration.
    • UserId: This value is used for logging into UPS. This value is the one you chose to login with when registering for service with UPS.
    • Password: This value is used for logging into UPS. This value is the one you chose to login with when registering for service with UPS.
    • AccountNumber: This is a valid 6-digit or 10-digit UPS account number.
    • PrintLabelLocation: This property is required if one intends to use the GenerateLabels or GenerateReturnLabels stored procedures. This should be set to the folder location where generated labels should be stored.

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the UPS JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

    java -jar cdata.jdbc.ups.jar

    Fill in the connection properties and copy the connection string to the clipboard.

    scala> val ups_df = spark.sqlContext.read.format("jdbc").option("url", "jdbc:ups:Server=https://wwwcie.ups.com/ups.app/xml;AccessKey=myAccessKey;Password=myPassword;AccountNumber=myAccountNumber;UserId=myUserId").option("dbtable","Senders").option("driver","cdata.jdbc.ups.UPSDriver").load()
  3. Once you connect and the data is loaded you will see the table schema displayed.
  4. Register the UPS data as a temporary table:

    scala> ups_df.registerTable("senders")
  5. Perform custom SQL queries against the Data using commands like the one below:

    scala> ups_df.sqlContext.sql("SELECT FirstName, Phone FROM Senders WHERE SenderID = 25").collect.foreach(println)

    You will see the results displayed in the console, similar to the following:

Using the CData JDBC Driver for UPS in Apache Spark, you are able to perform fast and complex analytics on UPS data, combining the power and utility of Spark with your data. Download a free, 30 day trial of any of the 160+ CData JDBC Drivers and get started today.