A Comparison of Database Drivers for MySQL



The metrics in this article were found using the most up-to-date drivers available as of May 2020.

MySQL is "The world's most popular open source database" and provides a robust, widely-used way to store and access your data. In fact, many cloud and on-premise databases (like MemSQL and Google Cloud SQL) have adopted the MySQL interface. There are several native, open-source drivers available for connecting to your MySQL data from other applications.

This article compares the read and write performance of the native MySQL Connectors (JDBC and ODBC)1 and the CData Software JDBC and ODBC Drivers for MySQL2 when working with larger datasets. We show how the CData Driver is able to read large data sets approximately 20% faster than the native driver and write 1 million rows nearly 40% faster than the native drivers.


The Data



In order to provide a reproducible comparison for reading data, we copied the US Amazon book reviews from the Amazon Customer Reviews dataset3 into a MySQL database. The details for the amazon_book_reviews table are below:

Table Size     Table Number of Rows     Number of Columns
9.7 GB 10,534,179 15


JDBC Driver Read Performance



The main goal of this investigation was to compare the related performance of the JDBC drivers. We did this by running the same set of queries with each JDBC driver. We used LIMIT clauses to change the size of the dataset returned in each query, but we requested the same columns for each query.

Base Query

SELECT 
	marketplace, 
	customer_id,
	review_id, 
	product_id, 
	product_parent,
	product_title, 
	product_category, 
	star_rating,
	helpful_votes,
	total_votes,
	vine, 
	verified_purchase, 
	review_headline, 
	review_body, 
	review_date,
FROM 
  cdata.amazon_book_reviews;

Limit by Query

  1. LIMIT 1000000
  2. Full Dataset

To test the drivers, we connected to MySQL using a basic Java application and executed the above queries repeatedly. The results were read and stored in a new variable (based on the datatype) for each column in each row.

Query Times by Driver (in seconds)
Query CData JDBC Driver MySQL Connector/J
1 (1,000,000 rows) 14.02 (+26%) 17.67
2 (~10,000,000 rows) 171.09 (+20.7%) 206.42

As can be seen in the results, the CData Driver regularly outperformed the native MySQL Driver, largely due to the way the CData JDBC Driver uses client-side resources.

JDBC Driver Resource Usage

While testing the read performance of the JDBC drivers, we also measured client-side resource usage, looking specifically at memory. The charts below were found by running a sample Java program and using Java VisualVM to capture the memory usage. We used Java OpenJDK version 12.0.2 with a maximum heap size of 8 Gigabytes.

For this comparison, we ran the query for the full dataset.

CData Driver

Native Driver*

* Note the change in scale for the Heap graph.

The CData Driver averages approximately 150 MB of heap usage for the duration of the run. However, the native driver continues to use more and more client resources for the duration of the run. In fact, we had to set the max Heap size to 8 GB in order for the native MySQL Driver to complete its execution. Despite the fact that the native driver is using significantly more resources, it still takes longer to read the data than it takes the CData JDBC Driver.


ODBC Driver Read Performance



The main goal of this investigation was to compare the related performance of the ODBC Drivers. We did this by running the same set of queries with each ODBC Driver. We used LIMIT clauses to change the size of the dataset returned in each query, but we requested the same columns for each query.

Base Query

SELECT 
	marketplace, 
	customer_id,
	review_id, 
	product_id, 
	product_parent,
	product_title, 
	product_category, 
	star_rating,
	helpful_votes,
	total_votes,
	vine, 
	verified_purchase, 
	review_headline, 
	review_body, 
	review_date,
FROM 
  cdata.amazon_book_reviews;

Limit by Query

  1. LIMIT 1000000
  2. Full Dataset

To test the drivers, we connected to MySQL using a basic C++ application and executed the above queries repeatedly. The results were read and stored in a new variable (based on the datatype) for each column in each row.

Query Times by Driver (in seconds)
Query CData ODBC Driver MySQL Connector/ODBC
1 (1,000,000 rows) 14.95 (+96.7%) 29.40
2 (~10,000,000 rows) 177.51 (+72.3%) 305.77

As can be seen in the results, the CData ODBC Driver regularly outperformed the MySQL Connector/ODBC, largely due to better use of available client resources.

ODBC Driver Resource Usage

While testing the read performance of the ODBC drivers, we also measured client-side resource usage, looking specifically at processing capacity and network bandwidth. The charts below were found by executing the simple C++ application and using the Windows Resource Monitor.

For this comparison, we ran the query for the full dataset.

CData Driver

Native Driver

Both drivers appear to use 20 - 25% of the available processing capacity, with CData using slightly more. The larger difference is in the network bandwidth used, where the CData Driver uses around 250Mbps compared to the 100Mbps used by the native driver.


Million Row Challenge



In addition to measuring read performance, we also want to compare the write performance of the drivers. In short, the CData MySQL JDBC Driver is able to write 1 million rows nearly 40% faster than the native MySQL Connector.

We used a simple Java program to add the rows to a copy of the amazon_book_reviews table referenced above.3. For our testing, we inserted the data in 100 batches of 10,000 rows.

Sample Code

//one batch
Connection connection = DriverManager.getConnection("jdbc:mysql:server=" + myServer + ";port=" + myPort + ";user=" + myUser + ";password=" + myPassword + ";database=" + myDatabse + ";");
String cmd = "INSERT INTO cdata.amazon_book_reviews_insert (marketplace, customer_id, review_id, product_id, product_parent, product_title, product_category, star_rating, helpful_votes, total_votes, vine, verified_purchase, review_headline, review_body, review_date) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
PreparedStatement pstmt = connection.prepareStatement(cmd);
 
for (int row = 0; row < rows.length; row++){
	pstmt.setString(1, marketplace);
	pstmt.setLong(2, customer_id);
	pstmt.setString(3, review_id);
	pstmt.setLong(4, product_id);
	pstmt.setLong(5, product_parent);
	pstmt.setString(6, product_title);
	pstmt.setString(7, product_category);
	pstmt.setLong(8, star_rating);
	pstmt.setLong(9, helpful_votes);
	pstmt.setLong(10, total_votes);
	pstmt.setString(11, vine);
	pstmt.setString(12, verified_purchase);
	pstmt.setString(13, review_headline);
	pstmt.setString(14, review_body);
	pstmt.setDate(15, review_date);
	
  pstmt.addBatch();
}

int[] affected = pstmt.executeBatch();

Results

Time (in Seconds) to Insert 1m Rows
CData JDBC Driver MySQL Connector/J
77.6 127.9

Conclusion



The CData Driver offers better querying of large datasets over the native connector, processing the largest dataset at least 20% faster by making better use of the available client resources.

When it comes to inserting data, the CData Driver stands apart in its ability to rapidly insert large sets of data. The CData JDBC Driver is able to insert 1 million rows in under 80 seconds, nearly 40% faster than the native connector.

Our engineering team has optimized our SQL engine to point that connectivity is only observably hindered by wire traffic and server processing times.

References



  1. MySQL Connectors
  2. CData Drivers
  3. Amazon Customer Reviews Dataset