Products

Solutions

Connectors

Support

Company

Resources

A Comparison of JDBC & ODBC Drivers for Amazon Athena

The metrics in this article are from the most up-to-date drivers available as of August 2019.

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. With the CData Drivers for Amazon Athena, you get top-of-the-line performance with those queries through standards-based interfaces such as JDBC and ODBC. In this article, we compare read performance and resource usage. We find that the CData Drivers consistently retrieve large data sets nearly three times faster than the Amazon Drivers and make better usage of client-side resources to drive that performance.

Preparation

This article will compare two sets of drivers:

The Amazon-supported ODBC Driver for Amazon Athena 1.0.5¹ to the CData Software ODBC Driver for Amazon Athena³
The Amazon-supported JDBC Driver for Amazon Athena 2.0.7² to the CData Software JDBC Driver for Amazon Athena⁴

To provide a reproducible comparison, we copied the Amazon Customer Reviews dataset⁵ to a bucket in our S3 instance and created a partitioned table in Athena (amazon_reviews_parquet) based on the S3 bucket.

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.

Comparison

The relevant details for the table are below:

Table Size		Table Number of Rows		Number of Columns
2.74 GB		160,796,570		16

We compared the related performance of the drivers by running the same queries with each driver. The queries are listed below:

SELECT * FROM amazon_reviews_parquet LIMIT 250000
SELECT * FROM amazon_reviews_parquet LIMIT 10000000

Results

For the JDBC drivers, we connected to Athena using the java.sql library in a basic Java application. For the ODBC drivers, we connected to Athena using a DSN from a sample C program. The results were read and stored for every field in each row. The times you see in the chart below are based on averages of multiple runs, which should serve to level out any outliers due to spikes in network traffic, etc.

Query Times by Driver (in seconds)
Query	CData JDBC	Amazon JDBC	CData ODBC	Amazon ODBC
25,000 rows	15.43 (105.1% faster)	31.67	18.14 (161.8% faster)	47.50
10,000,000 rows	333.66 (179.1% faster)	931.23	353.03 (311.3% faster)	1,451.99

As can be seen in the results, the CData Drivers significantly outperformed the Amazon Drivers when working with large result sets, regularly retrieving and processing large datasets nearly three times faster than the Amazon Drivers could.

JDBC Driver Resource Usage

While testing the read performance of the JDBC drivers, we also measured client-side resource usage, looking specifically at memory and CPU usage. The charts below were found by running a sample Java program and using Java VisualVM to capture the CPU and memory usage. We used Java version 8 update 211 with a maximum heap size of 4.27 Gigabytes.

For this comparison, we ran a query for 10 million rows: SELECT * FROM amazon_reviews_parquet LIMIT 10000000

CData Driver

Amazon Driver*

* Note the change in scale for the Heap graph.

Based on the graph, we can infer that both drivers send a request to Amazon Athena and then wait for the response. Once Athena begins returning the results of the query, the drivers begin to process it.

The CData Driver retrieves and processes the bulk of the data quickly, averaging near 600 MB of heap usage for the first minute and then using around 150 MB after the bulk of the data has been processed. Based on the CPU usage graph, we can see that the CData Driver is consistently using around 25% of the CPU available for most of the transaction and drops to around an average of 15%, consistently processing the data as fast as it is returned.

The native driver appears to start processing data as soon as Athena responds, but uses less CPU capacity and less heap (averaging around 17% of CPU capacity and around 150 MB of heap usage) throughout the transaction. This lower CPU and heap usage results in a longer time to process the results.

ODBC Driver Resource Usage

We also measured client-side resource usage while testing the ODBC drivers, looking specifically at network and CPU usage. The charts below were found by running a sample C program and using Windows Resource Monitor.

For this comparison, we ran a query for 10 million rows: SELECT * FROM amazon_reviews_parquet LIMIT 10000000

CData Driver

Amazon Driver*

* Note the change in scale for the Network graph.

The graphs pictured capture the same window of the sample C application for each driver: after Athena has started returning the results from the query. Based on the graph, the CData ODBC Driver retrieves and processes the data simultaneously and is retrieving data at around 200 Mbps. The native driver appears to retrieve and process the data simultaneously as well, but does so at only 50 Mbps, at approximately 1/4 of the rate of the CData Driver.

Conclusion

The CData Driver's performance far exceeds that of the Amazon-supported driver. Our developers have spent countless hours optimizing the performance in processing the results returned by Amazon to the point that the drivers seem to only be hindered by web traffic and server processing times. This performance is particularly highlighted when a driver is required to process large amounts of data.

References

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Connect With Us

Get Started

Data Connectors

ETL/ ELT Solutions

Cloud & API Connectivity

OEM & Custom Drivers

Connect With Us

Get Started

Data Visualization

Company

Resources

In this article

Related articles

A Comparison of JDBC & ODBC Drivers for Amazon Athena

Preparation

Comparison

Results

JDBC Driver Resource Usage

CData Driver

Amazon Driver*

ODBC Driver Resource Usage

CData Driver

Amazon Driver*

Conclusion

References