Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →A Comparison of Drivers for Elasticsearch
The metrics in this article are from the most up-to-date drivers available as of July 2019.
In this article, we compare the performance of the CData JDBC Driver for Elasticsearch to the equivalent native driver, looking at two measures.
First, we compare read performance, measuring the amount of time it takes to query an Elasticsearch instance for data and process the result set in some way. We find that the CData Driver is 2.5x faster than the native driver.
Next, we compare the resource usage of each driver for read queries, focusing on CPU and network usage. This is used to explain the underlying cause of better performance from the CData Driver.
Since the drivers are being compared side-by-side, the performance of the machine itself is relatively unimportant; what matters is how the drivers compare relative to one another.
The Data
To provide a reproducible comparison, we copied the public trips_2015 dataset from Google. The relevant details for the table queried are below:
Table | Number of Rows |
---|---|
trips_2015 | 9,896,012 |
JDBC Driver Read Performance
First, we tested the related performance of the drivers by running the same two queries for each driver:
- SELECT * FROM trips_2015 LIMIT 25000
- SELECT * FROM trips_2015
JDBC Query Times by Company (in milliseconds) | ||
---|---|---|
Rows Queried | CData Software | Native |
25,000 | 1,016.9 (+35%) | 1,375.4 |
~10,000,000 | 199,577 (+155%) | 508,338 |
As you can see in the results, the CData JDBC Driver handled large result sets significantly faster than the native driver did, processing the largest dataset 2.5x faster.
JDBC Driver Resource Usage
While testing the read performance, we also measured client-side resource usage, looking specifically at memory and CPU usage. The charts below were found by running a sample Java program and using Java VisualVM to capture the CPU and memory usage. We used Java version 8 update 211 with a maximum heap size of 4.27 Gigabytes.
For this comparison, we ran a query for a large number of rows against our test Elasticsearch instance: SELECT * FROM trips_2015
CData Driver
Native Driver
Based on the graph, the CData Driver maintains a high CPU and memory usage, using nearly 40% of the CPU and averaging near 700 MBs of heap usage. In contrast, the native driver uses less than 5% of the CPU and is inconsistent in its memory usage. For the first third of the query, the driver is making use of around 400 MBs of the heap, but it then drastically reduces usage to less than 100 MBs. By making better use of client-side resources, the CData Driver requests and processes data more than twice as fast as the native driver. Finishing the read process faster not only saves on time, but it means that you are making the best use of resources on the Elasticsearch server as well.
Conclusion
The CData Software Driver regularly proves to be faster than the native driver, particularly when dealing with large data sets. Our developers have spent countless hours optimizing the performance in requesting data and processing the results returned by the Elasticsearch instance, capitalizing on the RUs allocated to a system. This engineering means you get the best performance possible based on your allocated RUs. Download a free, 30-day trial of any of our Elasticsearch drivers and experience the CData difference for yourself.
Related Articles
- Elasticsearch Driver Features & Differentiators - Compare the features and functionality of the CData Elasticsearch drivers to the native drivers.