Work with Sage US Data in Apache Spark Using SQL

Access and process Sage US Data in Apache Spark using the CData JDBC Driver.

Apache Spark is a fast and general engine for large-scale data processing. When paired with the CData JDBC Driver for Sage US, Spark can work with live Sage US data. This article describes how to connect to and query Sage US data from a Spark shell.

The CData JDBC Driver offers unmatched performance for interacting with live Sage US data due to optimized data processing built into the driver. When you issue complex SQL queries to Sage US, the driver pushes supported SQL operations, like filters and aggregations, directly to Sage US and utilizes the embedded SQL engine to process unsupported operations (often SQL functions and JOIN operations) client-side. With built-in dynamic metadata querying, you can work with and analyze Sage US data using native data types.

Install the CData JDBC Driver for Sage US

Download the CData JDBC Driver for Sage US installer, unzip the package, and run the JAR file to install the driver.

Start a Spark Shell and Connect to Sage US Data

  1. Open a terminal and start the Spark shell with the CData JDBC Driver for Sage US JAR file as the jars parameter: $ spark-shell --jars /CData/CData JDBC Driver for Sage US/lib/cdata.jdbc.sage50us.jar
  2. With the shell running, you can connect to Sage US with a JDBC URL and use the SQL Context load() function to read a table.

    The Application Id and Company Name connection string options are required to connect to Sage as a data source. You can obtain an Application Id by contacting Sage directly to request access to the Sage 50 SDK.

    Sage must be installed on the machine. The Sage.Peachtree.API.dll and Sage.Peachtree.API.Resolver.dll assemblies are required. These assemblies are installed with Sage in C:\Program Files\Sage\Peachtree\API\. Additionally, the Sage SDK requires .NET Framework 4.0 and is only compatible with 32-bit applications. To use the Sage SDK in Visual Studio, set the Platform Target property to "x86" in Project -> Properties -> Build.

    You must authorize the application to access company data: To authorize your application to access Sage, restart the Sage application, open the company you want to access, and connect with your application. You will then be prompted to set access permissions for the application in the resulting dialog.

    While the compiled executable will require authorization only once, during development you may need to follow this process to reauthorize a new build. To avoid restarting the Sage application when developing with Visual Studio, click Build -> Configuration Manager and uncheck "Build" for your project.

    Built-in Connection String Designer

    For assistance in constructing the JDBC URL, use the connection string designer built into the Sage US JDBC Driver. Either double-click the JAR file or execute the jar file from the command-line.

    java -jar cdata.jdbc.sage50us.jar

    Fill in the connection properties and copy the connection string to the clipboard.

    Configure the connection to Sage US, using the connection string generated above.

    scala> val sage50us_df = spark.sqlContext.read.format("jdbc").option("url", "jdbc:sage50us:ApplicationId=8dfafu4V4ODmh1fM0xx;CompanyName=Bellwether Garden Supply - Premium;").option("dbtable","Customer").option("driver","cdata.jdbc.sage50us.Sage50USDriver").load()
  3. Once you connect and the data is loaded you will see the table schema displayed.
  4. Register the Sage US data as a temporary table:

    scala> sage50us_df.registerTable("customer")
  5. Perform custom SQL queries against the Data using commands like the one below:

    scala> sage50us_df.sqlContext.sql("SELECT Name, LastInvoiceAmount FROM Customer WHERE Name = ALDRED").collect.foreach(println)

    You will see the results displayed in the console, similar to the following:

Using the CData JDBC Driver for Sage US in Apache Spark, you are able to perform fast and complex analytics on Sage US data, combining the power and utility of Spark with your data. Download a free, 30 day trial of any of the 180+ CData JDBC Drivers and get started today.