by Tomas Restrepo | May 12, 2019

Azure Cosmos DB Driver の各種比較

Azure Cosmos DB は、Microsoft 社がAzure クラウドプラットフォームの一部として提供するマルチモデルのNoSQL データベースです。 Cosmos DB は以下のモデルをサポートします:

  • SQL (DocumentDB)
  • MongoDB
  • Cassandra
  • Azure Tables
  • Gremlin (graph)


CData では、Gremlin API を除くすべてのテクノロジーをサポートしています。CData のMongoDBCassandra、およびAzure Storage Tables の各ドライバーがCosmos DB の対応するインターフェースで利用可能です。

もちろん、CData は、DocumentDB と呼ばれていたCosmos DB SQL API もサポートしています。

ユーザーからは、「マイクロソフトが提供する無償のCosmos DB ODBC driver とCData Drivers は何が違うの?」という質問をもらいます。この記事では、CData Cosmos DB がMicrosoft ODBC Driver と比べ優れている点を見ていきましょう。

Performance

Our drivers provide increased performance compared to the Microsoft ODBC driver. We supported advanced push-down query capabilities, allowing the Cosmos DB service to execute queries more efficiently.

The biggest performance gain in our provider is noticeable in running queries that return a large number of results. To showcase this, we prepared the following test case:

  • Cosmos DB collection located in West Europe region with 1.8 million documents in it
  • Collection throughput configured to 4000 RU/s
  • Ran a simple query that would fetch 3 columns from all documents in the collection ("SELECT _id,category, short_description FROM products").
  • Client environment: Intel 7700K CPU, 48GB RAM.
Scenario (1.8M Docs, 4000 RU/s) Time to fetch all rows
Microsoft Azure Cosmos DB Driver 38 minutes
CData Driver (default configuration) 7 minutes
CData Driver (high throughput) 3.5 minutes

As you can see, the results are clear: our driver was able to retrieve all documents in the collection in one fifth of the time of the Microsoft ODBC driver. These numbers will of course vary substantially depending on factors such as collection size, document size, and network latency.

Another big capability to improve performance is computing aggregations without a GROUP BY clause server-side. Cosmos DB only supports computing aggregated columns directly when doing a query over a single partition key or partition key range, so the official ODBC driver will compute these aggregations client-side.

Our driver can recognize such queries and can split them over multiple queries that can be executed server-side, then aggregate results correctly client-side.

SQL Features

In our testing, we've identified some limitations in the Microsoft ODBC driver around Cosmos DB features that are not supported fully supported.

Built-in Functions

The Microsoft ODBC driver does not support invoking several built-in Cosmos DB functions in queries. For example:

  • Mathematical functions (ROUND, TRUNC, SQUARE, ATN2)
  • Type functions (IS_ARRAY, IS_BOOL, IS_NUMBER, ...)
  • String functions (STARTSWITH, ENDSWITH, CONTAINS, ...)
  • Array functions (ARRAY_LENGTH, ARRAY_CONCAT, ARRAY_SLICE, ...)

By comparison, our Cosmos DB drivers support all the native functions directly.

User-Defined Functions

The Microsoft ODBC driver does not support invoking User-defined Functions (UDFs) in your Cosmos DB collection. Our driver supports invoking them using the udf. Prefix, just like you do in the Cosmos DB query language:

SELECT id, udf.tax(price)
FROM [products]

User-Defined Stored Procedures

Cosmos DB supports creating server-side stored procedures in JavaScript. With the CData driver, metadata about existing stored procedures can be obtained using the same sys_procedures and sys_procedureparameters metadata tables used in all our drivers.

For example, you can invoke a procedure in the products collection using JDBC like this:

PreparedStatement statement = connection.prepareCall("products_queryProducts partitionKey=?, prefix=?");
statement.setString(1, "Health");
statement.setString(2, "init");
ResultSet resultSet = statement.executeQuery();

Since specifying a partition key is required when calling any stored procedures in Cosmos DB, our drivers will add a partitionKey parameter to all procedures.

The Microsoft ODBC driver does not support calling Stored Procedures at this time.

JOIN ... IN Queries

Cosmos DB supports doing joins between documents in a collection, and nested arrays in documents using the JOIN ... IN syntax:

SELECT p.id, k
FROM products p
  JOIN k IN p.keywords
WHERE p.category = 'Health'

The Microsoft ODBC driver does not support this SQL syntax, while our driver does, allowing you to directly take advantage of the capabilities offered by the Cosmos DB platform.

Platform Support

The biggest limitation of the Microsoft ODBC driver is that it's only available for the Windows platform. In comparison, CData's ODBC driver for Cosmos DB supports Windows, Linux, and Mac.

Besides the ODBC driver, Microsoft also ships client libraries for Cosmos DB for various platforms (.NET, Java, etc.). However, it does not provide standard data access components on any other platform.

Do you need to connect to Cosmos DB from a Java application that already supports JDBC? CData has you covered.

What about SQL Server Integration Services? We provide native SSIS components for Cosmos DB. ADO.NET and other platforms are also supported.

Microsoft Power BI

Microsoft offers a built-in connector in Power BI Desktop to query Cosmos DB data sources. Our Power BI connector for Cosmos DB offers improved performance while fetching data in import mode, compared to the Microsoft offering.

Our connector also has support for Direct Query mode. This is of use in a limited subset of scenarios due to the Cosmos DB query capabilities, but this will change in the future as the service capabilities improve.

Try It Out!

You will find the latest version of our Cosmos DB drivers here. You will find lots of information to get started in our knowledge base, such as using our drivers in Tableau, Power BI, or SQL Server Integration Services (SSIS). If you have any questions, comments, or feedback about this or any other of our drivers, please feel free to reach out to me.



 
 
ダウンロード