by Jerod Johnson | December 20, 2023

How to Build a Dremio Connector: A Quick Guide with Examples

Dremio logo

As the data ecosystem continues to expand exponentially, business teams are leaning more on IT teams to build efficient and flexible data management to fuel business intelligence initiatives. Dremio, a pioneering data lakehouse platform, plays a crucial role in this landscape, enabling self-service analytics on data lakes.

This article explores the Dremio ARP (Advanced Relational Pushdown) Connectors and provides a guide on creating, installing, and using them.

What is Dremio?

Dremio redefines the data lakehouse, blending the best elements of data lakes and warehouses in a novel architecture. Their platform facilitates direct, interactive queries across multiple data sources without the need for data copies or proprietary formats. Dremio is designed for scalability and performance, making it a leading option for organizations seeking agile data solutions.

What is a Dremio ARP Connector?

Dremio’s ARP framework simplifies the creation of new, Dremio-compatible Connectors for any data source that has a JDBC driver. Traditionally, this has been restricted to relational data stores, but with CData JDBC Drivers, connectivity is now an option for hundreds of different SaaS applications, big data stores, and NoSQL sources.

What's more, any connectors built using the framework leverage Dremio's powerful pushdown capabilities to optimize query performance, allowing more processing to be done at the data source level, reducing network traffic, and improving overall efficiency.

How to build a Dremio connector

This article provides a lightweight guide on creating Dremio ARP connectors. Use the following steps to build your Dremio ARP connector:

  1. Understand the basics: Before diving into connector creation, familiarize yourself with Dremio's architecture, the ARP framework, and the specific requirements of the data source you wish to connect.
  2. Set up the development environment: Make sure your system has access to your Dremio installation and that you have downloaded the necessary JDBC drivers. Ensure you have a suitable IDE for coding.
  3. Create a connector project: Start by creating a new project in your IDE and include dependencies for Dremio and the ARP framework. Notably, your project will need to include a variety of classes from the com.dremio package.
  4. Define the connector: Implement the necessary classes and methods as per the Dremio SDK and ARP framework. This includes defining the connection properties, metadata retrieval methods, and query translation logic. This is the stage that involves coding to create two source files:
    1. Storage plugin configuration: A Java file, named [data source], that contains information about the name of the plugin, the connection options, user credentials, the name of the ARP file, which JDBC driver to use, and how configure a connection string for the JDBC driver.

      For example, if you want to build an ARP connector from the CData JDBC Driver for Salesforce, create the Java class SalesforceConf and include variables for the username, password, and security token needed to authenticate with Salesforce. The annotations (@) control how the variable appear and are used in the Dremio connection GUI (graphical user interface)

      public class SalesforceConf extends
      AbstractArpConf<SalesforceConf> {
         @DisplayMetadata(label = "Username")
         public String username;

         @DisplayMetadata(label = "Password")
         public String password;

         @DisplayMetadata(label = "Security Token")
         public String securityToken;
    2. Plugin ARP YAML file: A YAML file, named [data source].arp.yaml, that modifies the SQL queries that are sent to the JDBC driver from Dremio, enabling support for various data types and functions.

      The ARP file is broken down into several sections:
      • metadata – This section outlines some high level metadata about the plugin.
      • syntax – This section allows for specifying some general syntax items like the identifier quote character.
      • data_types – This section outlines which data types are supported by the plugin, their names as they appear in the JDBC driver, and how they map to Dremio types.
      • relational_algebra – This section is divided up into a few other subsections:
      • aggregation – Specify what aggregate functions, such as SUM, MAX, etc., are supported and what signatures they have. You can also specify a rewrite to alter the SQL for how this is issued.
      • except/project/join/sort/union/union_all/values – These sections indicate if the specific operation is supported or not.
      • expressions – This section outlines general operations that are supported. The main sections are: 
        • operators – Outlines which scalar functions, such as SIN, SUBSTR, LOWER, etc., are supported, along with the signatures of these functions which are supported. Finally, you can also specify a rewrite to alter the SQL for how this is issued.
        • variable_length_operators - The same as operators, but allows specification of functions which may have a variable number of arguments, such as AND and OR.
  5. Optimize for performance: Leverage the power of the ARP framework to push down as many operations as possible to the data source. This step is crucial for maximizing performance.
  6. Build the connector: Once you have written the necessary code, use Maven to build the connector: mvn clean install The end result is a JAR file that contains the ARP connector that is ready to deploy to your Dremio instance.
  7. Test thoroughly: Rigorous testing is key. Ensure your connector handles a variety of queries and data types correctly. Test performance under different loads and use cases.
  8. Deploy and document: Once tested, deploy the connector in your Dremio environment. Provide clear documentation for users, outlining installation steps, configuration options, and use cases.

4 Example applications for Dremio connectors

  1. Access Database: Integrate Microsoft Access databases, enabling users to query legacy Access data directly from Dremio.
  2. SAP Systems: Connect to SAP for real-time access to ERP data, enhancing business intelligence capabilities with critical business data.
  3. SQL Server: Seamless integration with SQL Server, offering direct querying capabilities for SQL Server databases.
  4. Microsoft SharePoint: Incorporate data from SharePoint lists in your Dremio data lakehouse for a more comprehensive view of your data stack.

ARP connectors built with CData JDBC Drivers

Building ARP connectors facilitates direct data access from Dremio. When organizations use Dremio and CData together, they are modernizing their data integration. fully participating in growing trends and best practices. Thanks to an extensive library of JDBC drivers, organizations can quickly deploy ARP Connectors for their entire data stack.

With enterprise-grade engineering built directly into the drivers, ARP Connectors built on CData Drivers get full pushdown functionality out of the box. Any businesses aiming to leverage their data across multiple platforms can simply build their connector using the CData template.

Final thoughts

Dremio ARP connectors are instrumental in unifying diverse data sources, paving the way for advanced analytics and informed business decisions. If the Dremio Hub doesn't already have an ARP Connector for the sources your organization is using, you can follow this guide to build efficient, high-performance connectors that enhance your organization's data capabilities.

Once you've connected to your data from Dremio, you can use Dremio's drivers to access your data lakehouse from your preferred BI and reporting tools.

Try CData JDBC Drivers for free

Download a free trial of any of the CData JDBC Drivers and build your own ARP Connector today. 

Download now