Ready to get started?

Learn more about the CData JDBC Driver for USPS or download a free trial:

Download Now

Work with USPS Data in Apache Spark Using SQL

Access and process USPS Data in Apache Spark using the CData JDBC Driver.

Apache Spark is a fast and general engine for large-scale data processing. When paired with the CData JDBC Driver for USPS, Spark can work with live USPS data. This article describes how to connect to and query USPS data from a Spark shell.

The CData JDBC Driver offers unmatched performance for interacting with live USPS data due to optimized data processing built into the driver. When you issue complex SQL queries to USPS, the driver pushes supported SQL operations, like filters and aggregations, directly to USPS and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can work with and analyze USPS data using native data types.

Install the CData JDBC Driver for USPS

Download the CData JDBC Driver for USPS installer, unzip the package, and run the JAR file to install the driver.

Start a Spark Shell and Connect to USPS Data

  1. Open a terminal and start the Spark shell with the CData JDBC Driver for USPS JAR file as the jars parameter: $ spark-shell --jars /CData/CData JDBC Driver for USPS/lib/cdata.jdbc.usps.jar
  2. With the shell running, you can connect to USPS with a JDBC URL and use the SQL Context load() function to read a table.

    To authenticate with USPS, set the following connection properties.

    • PostageProvider: The postage provider to use to process requests. Available options are ENDICIA and STAMPS. If unspecified, this property will default to ENDICIA.
    • UseSandbox: This controls whether live or test requests are sent to the production or sandbox servers. If set to true, the Password, AccountNumber, and StampsUserId properties are ignored.
    • StampsUserId: This value is used for logging into authentication to the Stamps servers. This value is not applicable for Endicia and is optional if UseSandbox is true.
    • Password: This value is used for logging into Endicia and Stamps servers. If the postage provider is Endicia, this will be the pass phrase associated with your postage account. It is optional if UseSandbox is true.
    • AccountNumber: The shipper's account number. It is optional if UseSandbox is true.
    • PrintLabelLocation: This property is required to use the GenerateLabels or GenerateReturnLabels stored procedures. This should be set to the folder location where generated labels should be stored.

    The Cache Database

    Many of the useful task available from USPS require a lot of data. To ensure this data is easy to input and recall later, utilize a cache database to make requests. Set the cache connection properties in order to use the cache:

    • CacheLocation: The path to the cache location, for which a connection will be configured with the default cache provider. For example, C:\users\username\documents\uspscache

    As an alternative to CacheLocation, set the combination of CacheConnection and CacheProvider to configure a cache connection using a provider separate from the default.

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the USPS JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

    java -jar cdata.jdbc.usps.jar

    Fill in the connection properties and copy the connection string to the clipboard.

    scala> val usps_df = spark.sqlContext.read.format("jdbc").option("url", "jdbc:usps:PostageProvider=ENDICIA; RequestId=12345; Password='abcdefghijklmnopqr'; AccountNumber='12A3B4C'").option("dbtable","Senders").option("driver","cdata.jdbc.usps.USPSDriver").load()
  3. Once you connect and the data is loaded you will see the table schema displayed.
  4. Register the USPS data as a temporary table:

    scala> usps_df.registerTable("senders")
  5. Perform custom SQL queries against the Data using commands like the one below:

    scala> usps_df.sqlContext.sql("SELECT FirstName, Phone FROM Senders WHERE SenderID = 25").collect.foreach(println)

    You will see the results displayed in the console, similar to the following:

Using the CData JDBC Driver for USPS in Apache Spark, you are able to perform fast and complex analytics on USPS data, combining the power and utility of Spark with your data. Download a free, 30 day trial of any of the 160+ CData JDBC Drivers and get started today.