by Tomas Restrepo | December 10, 2018

CData Architecture: Core Driver Services

This is part 2 in a series of blog posts on our driver architecture. I mentioned before (Part 1: Supporting Multiple Technologies) that our core codebase had a number of different services offered to provider implementations. Some of these are generic services used by every single provider, while others simplify the implementation of a certain class of provider.

This post will focus on the driver services:

Metadata Caching

Metadata is a critical part of every data access driver. Every provider implementation needs to support basic metadata operations, such as:

  • Querying the list of tables and views in the data source.
  • Querying the list of columns on a table/view.
  • Querying other objects such as stored procedures, indexes, and so forth.

We've discussed some challenges around metadata already in a previous post, but one of the most common challenges is that, for many technologies, obtaining metadata from the data source can be very expensive.

Our metadata layer is architected in such a way that every driver provides metadata in a common format (which is also extensible if required) while providing significant performance advantages by transparently caching the discovered metadata.

By default, metadata is cached in-memory and refreshed periodically, but our metadata caching service is also capable of caching metadata in an external repository, such as a Derby or SQLite database.

One challenge we've faced in our in-memory Metadata Caching service implementation is supporting data sources with large, complex data models. Optimizing our internal metadata cache and metadata discovery process has been a priority.

SQL Normalization

The SQL Normalization engine is another key component of our driver model. When our provider receives a SQL query, it typically applies a set of normalization rules to the query before attempting to execute it.

What do we mean by SQL Normalization?

A SQL Normalization rule is merely a transformation of the query Abstract Syntax Tree (AST) that is meaning-preserving; that is, it doesn't change the meaning of the query. This transformation simplifies driver implementation by making it easier to interpret the query based on some particular assumptions about the shape of the query.

Why is this important? To support all of the most popular BI and Analytics tools, our Drivers must support diverse query capabilities. These tools generate SQL statements automatically based on their internal rules. Often at first glance, these queries seem very complex but express simple concepts. For example, a tool might generate a nested SELECT statement that could be simplified into a single one.

We currently have over 20 different common normalization rules (and more that are provider-specific), so I'm not going to discuss every single one. However, let me mention a few examples to illustrate the point.

One trivial normalization to apply is to column references in the WHERE clause. A query could contain something like "... WHERE 5000 < revenue", which we can normalize so that the column reference always appears on the left-side of the expression: "... WHERE revenue > 5000".

A more interesting example could be criteria minimization. Reporting tools often use tricks, such as including "... AND (1=1)" in the WHERE clause. Since this always returns true, we can safely remove it from the query if the right conditions are present, so that the provider-specific code doesn't have to deal with it.

For each provider we build, we select the right set of normalizations to apply automatically during query processing. For some providers, we can also do additional transformations that take advantage of source-specific query capabilities to make queries run correctly or faster.

OData Support

A significant number of our drivers are for data sources that expose OData endpoints. We also have a generic OData Driver. Thus, it made sense to create a reusable, core OData implementation that simplified the implementation of additional drivers based on the standard.

This shared component takes care of tasks such as:

  • Reading and interpreting OData service and metadata documents.
  • Core implementations for OData data operations (both queries and data modification).
  • Support for ATOM and JSON formats, and OData version differences.

For most OData-based drivers, this shared component is invoked directly from the provider implementation.

Other Service

There are other services in our driver core, including some not presented in the architecture diagram, such as:

  • A connection pool implementation to reduce the cost of opening and closing connections.
  • Bulk Row Manager, which implements bulk data update support.
  • RowScan, which is our shared implementation for supporting scanning data rows from tables to dynamically discover column metadata.
  • Page Providers, which support retrieving result sets in pages for drivers created using our internal framework.
  • Parallel Fetch supports fetching pages both serially, as well as in parallel for increased performance.

Conclusion

In this second post in the series, we have covered some of the driver services implemented input driver code to simplify development of a new driver.

In the next post, we'll discuss the core driver model, and how queries are executed.

Next: "Query Execution"  »