Products

Solutions

Connectors

Support

Company

Resources

A Comparison of JDBC & ODBC Drivers for BigQuery

The metrics in this article were found using the most up-to-date drivers available as of May 2017. Find new performance metrics in our updated article.

BigQuery is "Google's fully managed, petabyte scale, low cost enterprise data warehouse for analytics" and provides a robust, widely-used way to store and access your data. For many companies, Google BigQuery is the first choice for a cloud-based analytics platform. Because BigQuery is cloud-based, there is no need for a database administrator and there is no infrastructure to maintain.

Preparation

This article will serve to compare the Google-supported ODBC Driver for Google BigQuery 2.0.6.1011¹ to the CData Software ODBC Driver for Google BigQuery 2016² and the Google-supported JDBC Driver for Google BigQuery 1.0.6.1008¹ to the CData Software JDBC Driver for Google BigQuery 2016³. In order to provide a reproducible comparison, we copied the trips table from the yellow dataset in the public nyc-tlc project⁴ (table ID: nyc-tlc:yellow.trips) to a private dataset. For clarity, we renamed the test table to nyc_yellow_trips.

The test machine specifications are as follows:
Operating System: Windows 7 Ultimate, SP1
Processor: Intel® Core^TM i3-2120 CPU @ 3.30GHz
Installed Memory (RAM): 8.00 GB
System type: 64-bit Operating System

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.

Comparison

The relevant details for the table are below:

Table Size		Table Number of Rows		Number of Columns
130 GB		1,108,779,463		19

The main goal of this investigation was to compare the related performance of the drivers. We did this by running the same queries with each driver. The queries are listed below:

SELECT * FROM nyc_yellow_trips LIMIT 100000
SELECT * FROM nyc_yellow_trips LIMIT 1000000
SELECT * FROM nyc_yellow_trips LIMIT 10000000

Results

For the ODBC Drivers, we connected to BigQuery using a DSN from an ADO.NET console application and executed the above queries repeatedly. The results were read and stored in a new string variable for each row. The times you see in the chart below are based on averages, which should serve to level out any outliers due to spike in network traffic, etc.

Query Times by Driver (in seconds)
Query	Google ODBC	CData ODBC	Google JDBC	CData JDBC
1 (100,000 rows)	34.07	16.68 (+104%)	71.63	18.99 (+277%)
2 (1,000,000 rows)	461.63	233.56 (+98%)	318.04	149.11 (+113%)
3 (10,000,000 rows)	*	1,748.94	**	1,771.29

* We were unable to retrieve 10 million rows using the Google ODBC Driver without receiving a System.StackOverflowException
** We were unable to get consistent results for 10 million rows using the Google JDBC Driver as we regularly encountered errors like "Error fetching results from server." and "Error trying to obtain Google BigQuery object."

As can be seen in the results, the CData drivers significantly outperformed the Google drivers when working with large result sets, regularly retrieving and processing results twice as fast. It is noteworthy that the CData drivers were consistently able to process 10 million rows, whereas the Google ODBC driver was unable to process such a large result set and the Google JDBC driver was only able to do so sparingly.

The average runtime for each query is compared in the charts below:

Results for 100,000 Rows

Results for 1,000,000 Rows

Conclusion

The CData driver's performance far exceeds that of the Google-supported driver. Our developers have spent countless hours optimizing the performance in processing the results returned by Google to the point that the drivers seem to only be hindered by web traffic and server processing times. This performance is particularly highlighted when the driver is required to process large amounts of data.

References

Google BigQuery: Million Row Challenge - We take on the challenge of using the CData JDBC Driver to upload one million rows in Google Big Query.

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Connect With Us

Get Started

Data Connectors

ETL/ ELT Solutions

Cloud & API Connectivity

OEM & Custom Drivers