by Eric Madariaga | October 31, 2017

Million Row Challenge: BigQuery & Redshift

We have already demonstrated that our drivers are unmatched when it comes to read performance, especially when working with large data sets. However, many of our customers use enterprise ETL or Data Warehousing solutions and require high-performance bulk data loading in addition to fast read access.

Our customers routinely move millions of records between on-premise and cloud data sources. So, we decided to put together a set of performance tests that would demonstrate how our drivers perform in these scenarios. We took on the challenge of inserting one million rows into two widely-used, cloud-based data stores (Google BigQuery and Amazon Redshift) using our JDBC drivers - a task that can take up to 6 hours using other transfer technologies.

Based on our industry-leading read performance, the bulk-load results were not surprising. Typically, we include a graph or chart that visually demonstrates our performance, but the differences between our drivers and the source-specific ones were so staggering that the figures would not have been meaningfully in any way. Instead, we present a simple table showing our results:

Time to Upload 1 Million Rows
Data Source CData Driver Native Driver
Amazon Redshift 1m 43s (~200x faster) 5h 53m 17s
Google BigQuery 21m N/A

As you can see, our JDBC Driver for Amazon Redshift is orders of magnitude faster than the native drivers. Regarding Google BigQuery, the native drivers available from Google do not support bulk-loading large data sets. While those drivers have some support for inserting and updating multiple rows, the current implementation quickly encounters API limits with large datasets.

We have published separate Knowledge Base articles outlining our tests and results, which can be found below:

Don't feel like you have to take our word for it; download a free, 30-day trial of any of our Drivers and see why top vendors continue to choose CData Software to power their data connectivity.