A Comparison of Drivers for Elasticsearch



The metrics in this article are from the most up-to-date drivers available as of July 2019.

In this article, we compare the performance of the CData JDBC Driver for Elasticsearch to the equivalent native driver, looking at two measures.

First, we compare read performance, measuring the amount of time it takes to query an Elasticsearch instance for data and process the result set in some way. We find that the CData Driver is 2.5x faster than the native driver.

Next, we compare the resource usage of each driver for read queries, focusing on CPU and network usage. This is used to explain the underlying cause of better performance from the CData Driver.

Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.

The Data



To provide a reproducible comparison, we copied the public trips_2015 dataset from Google. The relevant details for the table queried are below:

Table Number of Rows
trips_2015 9,896,012

JDBC Driver Read Performance



First, we tested the related performance of the drivers by running the same two queries for each driver:

  1. SELECT * FROM trips_2015 LIMIT 25000
  2. SELECT * FROM trips_2015

To simulate actual processing of the data from Elasticsearch, we read the values of every field in each row. The times required for each product to process the results are in the table below.

JDBC Query Times by Company (in milliseconds)
Rows Queried CData Software Native
25,000 1,016.9 (+35%) 1,375.4
~10,000,000 199,577 (+155%) 508,338

As you can see in the results, the CData JDBC Driver handled large result sets significantly faster than the native driver did, processing the largest dataset 2.5x faster.

JDBC Driver Resource Usage



While testing the read performance, we also measured client-side resource usage, looking specifically at memory and CPU usage. The charts below were found by running a sample Java program and using Java VisualVM to capture the CPU and memory usage. We used Java version 8 update 211 with a maximum heap size of 4.27 Gigabytes.

For this comparison, we ran a query for a large number of rows against our test Elasticsearch instance: SELECT * FROM trips_2015

CData Driver

Native Driver

Based on the graph, the CData Driver maintains a high CPU and memory usage, using nearly 40% of the CPU and averaging near 700 MBs of heap usage. In contrast, the native driver uses less than 5% of the CPU and is inconsistent in its memory usage. For the first third of the query, the driver is making use of around 400 MBs of the heap, but it then drastically reduces usage to less than 100 MBs. By making better use of client-side resources, the CData Driver requests and processes data more than twice as fast as the native driver. Finishing the read process faster not only saves on time, but it means that you are making the best use of resources on the Elasticsearch server as well.

Conclusion



The CData Software Driver regularly proves to be faster than the native driver, particularly when dealing with large data sets. Our developers have spent countless hours optimizing the performance in requesting data and processing the results returned by the Elasticsearch instance, capitalizing on the RUs allocated to a system. This engineering means you get the best performance possible based on your allocated RUs. Download a free, 30-day trial of any of our Elasticsearch drivers and experience the CData difference for yourself.

Related Articles