The metrics in this article are from the most up-to-date drivers available as of July 2019.
In this article, we compare the performance of the CData JDBC Driver for Elasticsearch to the equivalent native driver, looking at two measures.
First, we compare read performance, measuring the amount of time it takes to query an Elasticsearch instance for data and process the result set in some way. We find that the CData Driver is 2.5x faster than the native driver.
Next, we compare the resource usage of each driver for read queries, focusing on CPU and network usage. This is used to explain the underlying cause of better performance from the CData Driver.
Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.
The Data
To provide a reproducible comparison, we copied the public trips_2015 dataset from Google. The relevant details for the table queried are below:
Table
Number of Rows
trips_2015
9,896,012
JDBC Driver Read Performance
First, we tested the related performance of the drivers by running the same two queries for each driver:
SELECT * FROM trips_2015 LIMIT 25000
SELECT * FROM trips_2015
To simulate actual processing of the data from Elasticsearch, we read the values of every field in each row. The times required for each product to process the results are in the table below.
JDBC Query Times by Company (in milliseconds)
Rows Queried
CData Software
Native
25,000
1,016.9 (+35%)
1,375.4
~10,000,000
199,577 (+155%)
508,338
As you can see in the results, the CData JDBC Driver handled large result sets significantly faster than the native driver did, processing the largest dataset 2.5x faster.
JDBC Driver Resource Usage
While testing the read performance, we also measured client-side resource usage, looking specifically at memory and CPU usage. The charts below were found by running a sample Java program and using Java VisualVM to capture the CPU and memory usage. We used Java version 8 update 211 with a maximum heap size of 4.27 Gigabytes.
For this comparison, we ran a query for a large number of rows against our test Elasticsearch instance: SELECT * FROM trips_2015
CData Driver
Native Driver
Based on the graph, the CData Driver maintains a high CPU and memory usage, using nearly 40% of the CPU and averaging near 700 MBs of heap usage. In contrast, the native driver uses less than 5% of the CPU and is inconsistent in its memory usage. For the first third of the query, the driver is making use of around 400 MBs of the heap, but it then drastically reduces usage to less than 100 MBs. By making better use of client-side resources, the CData Driver requests and processes data more than twice as fast as the native driver. Finishing the read process faster not only saves on time, but it means that you are making the best use of resources on the Elasticsearch server as well.
Conclusion
The CData Software Driver regularly proves to be faster than the native driver, particularly when dealing with large data sets. Our developers have spent countless hours optimizing the performance in requesting data and processing the results returned by the Elasticsearch instance, capitalizing on the RUs allocated to a system. This engineering means you get the best performance possible based on your allocated RUs. Download a free, 30-day trial of any of our Elasticsearch drivers and experience the CData difference for yourself.
This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media. To find out more about the cookies we use, see our Privacy Policy.