A Comparison of Database Drivers for SQL Server



The metrics in this article were found using the most up-to-date drivers available as of May 2020.

SQL Server is a relational database management system, or RDBMS, developed and marketed by Microsoft. This article compares the read performance of the native SQL Server JDBC Driver1 and the CData Software JDBC Driver for SQL Server2 when working with larger datasets.

We show how the CData Driver is able to read large data sets approximately 23% faster than the native driver and write 1 million rows slightly faster than the native driver.


The Data



To provide a reproducible comparison for reading data, we copied the US Amazon book reviews from the Amazon Customer Reviews dataset3 into a SQL Server database. The details for the amazon_book_reviews table are below:

Table Size     Table Number of Rows     Number of Columns
12.72 GB 10,237,177 15


JDBC Driver Read Performance



The main goal of this investigation was to compare the related performance of the JDBC drivers. We did this by running the same set of queries with each JDBC driver. We used LIMIT clauses to change the size of the dataset returned in each query, but we requested the same columns for each query.

Base Query

SELECT 
	marketplace, 
	customer_id,
	review_id, 
	product_id, 
	product_parent,
	product_title, 
	product_category, 
	star_rating,
	helpful_votes,
	total_votes,
	vine, 
	verified_purchase, 
	review_headline, 
	review_body, 
	review_date,
FROM 
  dbo.amazon_book_reviews;

Limit by Query

  1. LIMIT 1000000
  2. full dataset

To test the drivers, we connected to SQL Server using a basic Java application and executed the above queries repeatedly. The results were read and stored in a new variable (based on the datatype) for each column in each row.

Query Times by Driver (in seconds)
Query CData JDBC Driver SQL Server Connector/J
1 (1,000,000 rows) 4.14 (+30.8%) 5.42
2 (~10,000,000 rows) 52.46 (+22.9%) 64.49

As can be seen in the results, the CData Driver regularly outperformed Microsoft's SQL Server Connector, largely due to the way the CData Driver uses client-side resources.

JDBC Driver Resource Usage

While testing the read performance of the JDBC drivers, we also measured client-side resource usage, looking specifically at memory. The charts below were found by running a sample Java program and using Java VisualVM to capture the memory usage. We used Java OpenJDK version 12.0.2 with a maximum heap size of 8 Gigabytes.

For this comparison, we ran the query for the full dataset.

CData Driver

Native Driver

The CData Driver averages approximately 150 MB of heap usage for the duration of the run. The native driver, on the other hand, averages around 100 MB of heap usage. The CData Driver makes better use of the available client-side resources to provide faster read speeds than the native driver.


Million Row Challenge



In addition to measuring read performance, we also want to compare the write performance of the drivers. In short, the CData SQL Server JDBC Driver is able to write 1 million rows slightly faster than Microsoft's native SQL Server JDBC Connector.

We used a simple Java program to add the rows to a copy of the amazon_book_reviews table referenced above.3. For our testing, we inserted the data in 100 batches of 10,000 rows.

Sample Code

//one batch
Connection connection = DriverManager.getConnection("jdbc:sql:server=" + myServer + ";port=" + myPort + ";user=" + myUser + ";password=" + myPassword + ";database=" + myDatabse + ";");
connection.setAutoCommit(false);

String cmd = "INSERT INTO dbo.amazon_book_reviews_insert (marketplace, customer_id, review_id, product_id, product_parent, product_title, product_category, star_rating, helpful_votes, total_votes, vine, verified_purchase, review_headline, review_body, review_date) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
PreparedStatement pstmt = connection.prepareStatement(cmd);
 
for (int row = 0; row < rows.length; row++){
	pstmt.setString(1, marketplace);
	pstmt.setLong(2, customer_id);
	pstmt.setString(3, review_id);
	pstmt.setLong(4, product_id);
	pstmt.setLong(5, product_parent);
	pstmt.setString(6, product_title);
	pstmt.setString(7, product_category);
	pstmt.setLong(8, star_rating);
	pstmt.setLong(9, helpful_votes);
	pstmt.setLong(10, total_votes);
	pstmt.setString(11, vine);
	pstmt.setString(12, verified_purchase);
	pstmt.setString(13, review_headline);
	pstmt.setString(14, review_body);
	pstmt.setDate(15, review_date);
	
  pstmt.addBatch();
}

int[] affected = pstmt.executeBatch();
connection.commit();

Results

Time (in Seconds) to Insert 1m Rows
CData JDBC Driver SQL Server JDBC Driver
15.9 16.4

Conclusion



The CData Driver offers better querying of large datasets over the native driver, processing the largest dataset at least 20% faster by making better use of the available client resources to read and process data as quickly as possible.

When it comes to inserting data, the CData Driver is on par with the native driver in its ability to rapidly insert large sets of data. The CData JDBC Driver is able to insert 1 million rows in around 15.9 seconds, slightly faster than the native driver.

Our developers have labored to optimize the performance in processing the results returned by SQL Server to the point that the drivers are observably hindered only by wire traffic and server processing times.

References



  1. SQL Server Driver
  2. CData Driver
  3. Amazon Customer Reviews Dataset