A Feature Comparison of Drivers for NoSQL



In this article, we compare the way that standards-based drivers from different vendors handle various NoSQL use-cases (using MongoDB as the sample data source), from a simple request for all of the data in a given table to a complex query involving embedded arrays of data.

For the purposes of this comparison, each query is being sent to an instance of the restaurants dataset made available by MongoDB, Inc (downloadable here). Below you'll find a sample document which is representative of the kinds of documents found in the dataset.

Compared Use-Cases



Sample Document



{
  "_id" : ObjectId("5780046cd5a397806c3dab38"),
  "address" : {
    "building" : "1007",
    "coord" : [-73.856077, 40.848447],
    "street" : "Morris Park Ave",
    "zipcode" : "10462"
  },
  "borough" : "Bronx",
  "cuisine" : "Bakery",
  "grades" : [{
      "date" : ISODate("2014-03-03T00:00:00Z"),
      "grade" : "A",
      "score" : 2
    }, {
      "date" : ISODate("2013-09-11T00:00:00Z"),
      "grade" : "A",
      "score" : 6
    }, {
      "date" : ISODate("2013-01-24T00:00:00Z"),
      "grade" : "A",
      "score" : 10
    }, {
      "date" : ISODate("2011-11-23T00:00:00Z"),
      "grade" : "A",
      "score" : 9
    }, {
      "date" : ISODate("2011-03-10T00:00:00Z"),
      "grade" : "B",
      "score" : 14
    }],
  "name" : "Morris Park Bake Shop",
  "restaurant_id" : "30075445"
}


SELECT * FROM restaurants



When working with a traditional, relational database, the easiest way to retrieve all of the data in a given table is to simply submit a SELECT * FROM table query. MongoDB data is often anything but traditional, so it is important to understand how a given driver will complete such a query. Below, we display the results of a SELECT * ... query from each driver, using the default connection properties.

The drivers by Simba Technologies parse the grades array and the address object as separate tables when parsing the database. The drivers by Progress Software parse the grades and address.coord arrays as separate tables. While creating separate tables helps to distinguish the different types of data in a NoSQL database, it can have some drawbacks when it comes to querying data from both parent and child tables. The CData Software drivers expose the most data by default, flattening the address object and drilling down to retrieve the fields available in the elements in the grades array.

CData Software

_id address.building address.coord.0 address.coord.1 address.street address.zipcode borough cuisine name restaurant_id grades.0.date grades.0.grade grades.0.score
5780046cd5a397806c3dab38 1007 -73.856077 40.848447 Morris Park Ave 10462 Bronx Bakery Morris Park Bake Shop 30075445 2014-03-03T00:00:00.000Z A 2
5780046cd5a397806c3dab39 469 -73.961704 40.662942 Flatbush Avenue 11225 Brooklyn Hamburgers Wendy'S 30112340 2014-12-30T00:00:00.000Z A 8
5780046cd5a397806c3dab3a 351 -73.98513559999999 40.7676919 West 57 Street 10019 Manhattan Irish Dj Reynolds Pub And Restaurant 30191841 2014-09-06T00:00:00.000Z A 2
5780046cd5a397806c3dab3b 2780 -73.98241999999999 40.579505 Stillwell Avenue 11224 Brooklyn American Riviera Caterer 40356018 2014-06-10T00:00:00.000Z A 5

Simba Technologies

BOROUGH RESTAURANT_ID _ID CUISINE NAME
Bronx 30075445 5780046CD5A397806C3DAB38 Bakery Morris Park Bake Shop
Brooklyn 30112340 5780046CD5A397806C3DAB39 Hamburgers Wendy'S
Manhattan 30191841 5780046CD5A397806C3DAB3A Irish Dj Reynolds Pub And Restaurant
Brooklyn 40356018 5780046CD5A397806C3DAB3B American Riviera Caterer

Progress Software

_id address_building address_street address_zipcode borough cuisine name restaurant_id
5780046cd5a397806c3dab38 1007 Morris Park Ave 10462 Bronx Bakery Morris Park Bake Shop 30075445
5780046cd5a397806c3dab39 469 Flatbush Avenue 11225 Brooklyn Hamburgers Wendy'S 30112340
5780046cd5a397806c3dab3a 351 West 57 Street 10019 Manhattan Irish Dj Reynolds Pub And Restaurant 30191841
5780046cd5a397806c3dab3b 2780 Stillwell Avenue 11224 Brooklyn American Riviera Caterer 40356018


Embedded Arrays as Separate Tables



In our sample data, each document in the restaurants table contains an array of embedded documents in the grades element, representing the different grades a restaurant has received over time. According to the MongoDB documentation, "embedded documents capture the relationships between data by storing data in a single document structure." By default, the drivers by Simba Technologies and Progress Software create schema where the embedded documents are recognized only as separate tables (Simba calls the arrays Virtual Tables, Progress Software refers to them as child tables) and create a table schema where the grades table shares a foreign key relationship with the restaurants table. The CData drivers maintain the embedded documents as elements within the original document, but also allow you to treat the embedded values as separate tables as well.

Regardless of how the schema are defined, all of the drivers are able to perform JOIN queries to retrieve the related data from the two tables. In this section, we'll compare the queries required by each driver to return each grade, along with the name and ID of the restaurant, as a single row, along with the time required to retrieve the data for the query.

Desired Result Set

Below is the desired result set from each driver, populated with placeholder data.

restaurant_id date grade score P_id
30075445 2014-03-03T00:00:00.000Z A 2 568c37b748ddf53c5ed98932
30075445 2013-09-11T00:00:00.000Z A 6 568c37b748ddf53c5ed98932
30075445 2013-01-24T00:00:00.000Z A 10 568c37b748ddf53c5ed98932
30075445 2011-11-23T00:00:00.000Z A 9 568c37b748ddf53c5ed98932
30075445 2011-03-10T00:00:00.000Z B 14 568c37b748ddf53c5ed98932

Queries and Times by Driver

While the queries required by the drivers to retrieve each grade as a separate row are relatively similar, each using an implicit JOIN, it is worth noting that the drivers by Simba and Progress require the use of a WHERE clause to identify the relationship between the two tables. The CData drivers use vertical flattening (where child arrays are recognized as fields within the parent table, but can be treated as separate tables) to manage JOIN queries. The drivers by Simba and Progress treat the grades array as a separate table by default, meaning that data from both tables are pulled into memory and the drivers perform the JOIN client-side.

Driver Time (seconds) Query (returns all grades for approximately 10 million restaurants)
CData Software 252.9 (+35% - +59%) SELECT [restaurants].[restaurant_id], [restaurants.grades].* FROM [restaurants.grades] JOIN [restaurants]
Simba Technologies 341.5 SELECT restaurants_grades.*, restaurants.restaurant_id FROM restaurants_grades, restaurants WHERE restaurants._ID = restaurants_grades.restaurants_id
Progress Software 401.2 SELECT restaurants_grades.*, restaurants.restaurant_id FROM restaurants_grades, restaurants WHERE restaurants._ID = restaurants_grades._id


Embedded Arrays & Sub-Documents as Table Elements



MongoDB documents frequently contain embedded BSON/JSON objects and arrays as individual elements. In this section we will explore the queries required by each driver to retrieve a set of horizontally flattened data, namely the ID and first five grades of each restaurant in the table.

Desired Result Set

Below is the desired result set from each driver, populated with placeholder data.

restaurant_id grades.0.grade grades.1.grade grades.2.grade grades.3.grade grades.4.grade
123456780 A A A A A
123456781 B B B B B
123456782 C C C C C

CData Software

With the drivers by CData Software, you can simply submit a free-form query to request the data as described above. The drivers use dot-notation to interpret requests for individual array objects and fields within sub-documents:

SELECT
  [restaurant_id], [grades.0.grade], [grades.1.grade], [grades.2.grade], [grades.3.grade], [grades.4.grade]
FROM
  [restaurants]

Simba Technologies

The Simba drivers interpret arrays of documents as Virtual Tables, meaning users must perform JOIN queries in order to retrieve both restaurant and grades data. This interpretation of the data means that there is no simple way to retrieve the grades for a given restaurant in a single row.

Progress Software

The DataDirect (Progress) drivers also interpret arrays of documents as separate tables. As with the Simba drivers, this means users must perform JOIN queries in order to retrieve both restaurant and grades data, preventing a simple way to retrieve the grades for a given restaurant in a single row.

Related Articles



 
 
Downloads