各製品の資料を入手。
詳細はこちら →NoSQL 向けドライバーの機能比較
In this article, we compare the way that standards-based drivers from different vendors handle various NoSQL use-cases (using MongoDB as the sample data source), from a simple request for all of the data in a given table to a complex query involving embedded arrays of data.
For the purposes of this comparison, each query is being sent to an instance of the restaurants dataset made available by MongoDB, Inc (downloadable here). Below you'll find a sample document which is representative of the kinds of documents found in the dataset.
Compared Use-Cases
- SELECT * FROM restaurants: comparing the results of requesting all available data in the table
- Embedded Arrays as Separate Tables: comparing the results of sending JOIN queries to work with the table and the embedded array(s) of data.
- Embedded Arrays & Sub-Documents as Table Elements: comparing the results of working with embedded array(s) as elements within the table.
Sample Document
{ "_id" : ObjectId("5780046cd5a397806c3dab38"), "address" : { "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }, "borough" : "Bronx", "cuisine" : "Bakery", "grades" : [{ "date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }, { "date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6 }, { "date" : ISODate("2013-01-24T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2011-11-23T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2011-03-10T00:00:00Z"), "grade" : "B", "score" : 14 }], "name" : "Morris Park Bake Shop", "restaurant_id" : "30075445" }
SELECT * FROM restaurants
When working with a traditional, relational database, the easiest way to retrieve all of the data in a given table is to simply submit a SELECT * FROM table query. MongoDB data is often anything but traditional, so it is important to understand how a given driver will complete such a query. Below, we display the results of a SELECT * ... query from each driver, using the default connection properties.
The drivers by Competitor 1 parse the grades array and the address object as separate tables when parsing the database. The drivers by Competitor 2 parse the grades and address.coord arrays as separate tables. While creating separate tables helps to distinguish the different types of data in a NoSQL database, it can have some drawbacks when it comes to querying data from both parent and child tables. The CData Software drivers expose the most data by default, flattening the address object and drilling down to retrieve the fields available in the elements in the grades array.
CData Software
_id | address.building | address.coord.0 | address.coord.1 | address.street | address.zipcode | borough | cuisine | name | restaurant_id | grades.0.date | grades.0.grade | grades.0.score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5780046cd5a397806c3dab38 | 1007 | -73.856077 | 40.848447 | Morris Park Ave | 10462 | Bronx | Bakery | Morris Park Bake Shop | 30075445 | 2014-03-03T00:00:00.000Z | A | 2 |
5780046cd5a397806c3dab39 | 469 | -73.961704 | 40.662942 | Flatbush Avenue | 11225 | Brooklyn | Hamburgers | Wendy'S | 30112340 | 2014-12-30T00:00:00.000Z | A | 8 |
5780046cd5a397806c3dab3a | 351 | -73.98513559999999 | 40.7676919 | West 57 Street | 10019 | Manhattan | Irish | Dj Reynolds Pub And Restaurant | 30191841 | 2014-09-06T00:00:00.000Z | A | 2 |
5780046cd5a397806c3dab3b | 2780 | -73.98241999999999 | 40.579505 | Stillwell Avenue | 11224 | Brooklyn | American | Riviera Caterer | 40356018 | 2014-06-10T00:00:00.000Z | A | 5 |
Competitor 1
BOROUGH | RESTAURANT_ID | _ID | CUISINE | NAME |
---|---|---|---|---|
Bronx | 30075445 | 5780046CD5A397806C3DAB38 | Bakery | Morris Park Bake Shop |
Brooklyn | 30112340 | 5780046CD5A397806C3DAB39 | Hamburgers | Wendy'S |
Manhattan | 30191841 | 5780046CD5A397806C3DAB3A | Irish | Dj Reynolds Pub And Restaurant |
Brooklyn | 40356018 | 5780046CD5A397806C3DAB3B | American | Riviera Caterer |
Competitor 2
_id | address_building | address_street | address_zipcode | borough | cuisine | name | restaurant_id |
---|---|---|---|---|---|---|---|
5780046cd5a397806c3dab38 | 1007 | Morris Park Ave | 10462 | Bronx | Bakery | Morris Park Bake Shop | 30075445 |
5780046cd5a397806c3dab39 | 469 | Flatbush Avenue | 11225 | Brooklyn | Hamburgers | Wendy'S | 30112340 |
5780046cd5a397806c3dab3a | 351 | West 57 Street | 10019 | Manhattan | Irish | Dj Reynolds Pub And Restaurant | 30191841 |
5780046cd5a397806c3dab3b | 2780 | Stillwell Avenue | 11224 | Brooklyn | American | Riviera Caterer | 40356018 |
Embedded Arrays as Separate Tables
In our sample data, each document in the restaurants table contains an array of embedded documents in the grades element, representing the different grades a restaurant has received over time. According to the MongoDB documentation, "embedded documents capture the relationships between data by storing data in a single document structure." By default, the drivers by Competitor 1 and Competitor 2 create schema where the embedded documents are recognized only as separate tables (Competitor 1 calls the arrays Virtual Tables, Competitor 2 refers to them as child tables) and create a table schema where the grades table shares a foreign key relationship with the restaurants table. The CData drivers maintain the embedded documents as elements within the original document, but also allow you to treat the embedded values as separate tables as well.
Regardless of how the schema are defined, all of the drivers are able to perform JOIN queries to retrieve the related data from the two tables. In this section, we'll compare the queries required by each driver to return each grade, along with the name and ID of the restaurant, as a single row, along with the time required to retrieve the data for the query.
Desired Result Set
Below is the desired result set from each driver, populated with placeholder data.
restaurant_id | date | grade | score | P_id |
---|---|---|---|---|
30075445 | 2014-03-03T00:00:00.000Z | A | 2 | 568c37b748ddf53c5ed98932 |
30075445 | 2013-09-11T00:00:00.000Z | A | 6 | 568c37b748ddf53c5ed98932 |
30075445 | 2013-01-24T00:00:00.000Z | A | 10 | 568c37b748ddf53c5ed98932 |
30075445 | 2011-11-23T00:00:00.000Z | A | 9 | 568c37b748ddf53c5ed98932 |
30075445 | 2011-03-10T00:00:00.000Z | B | 14 | 568c37b748ddf53c5ed98932 |
Queries and Times by Driver
While the queries required by the drivers to retrieve each grade as a separate row are relatively similar, each using an implicit JOIN, it is worth noting that the drivers by Competitor 1 and Competitor 2 require the use of a WHERE clause to identify the relationship between the two tables. The CData drivers use vertical flattening (where child arrays are recognized as fields within the parent table, but can be treated as separate tables) to manage JOIN queries. The drivers by Competitor 1 and Competitor 2 treat the grades array as a separate table by default, meaning that data from both tables are pulled into memory and the drivers perform the JOIN client-side.
Driver | Time (seconds) | Query (returns all grades for approximately 10 million restaurants) |
---|---|---|
CData Software | 252.9 (+35% - +59%) | SELECT [restaurants].[restaurant_id], [restaurants.grades].* FROM [restaurants.grades] JOIN [restaurants] |
Competitor 1 | 341.5 | SELECT restaurants_grades.*, restaurants.restaurant_id FROM restaurants_grades, restaurants WHERE restaurants._ID = restaurants_grades.restaurants_id |
Competitor 2 | 401.2 | SELECT restaurants_grades.*, restaurants.restaurant_id FROM restaurants_grades, restaurants WHERE restaurants._ID = restaurants_grades._id |
Embedded Arrays & Sub-Documents as Table Elements
MongoDB documents frequently contain embedded BSON/JSON objects and arrays as individual elements. In this section we will explore the queries required by each driver to retrieve a set of horizontally flattened data, namely the ID and first five grades of each restaurant in the table.
Desired Result Set
Below is the desired result set from each driver, populated with placeholder data.
restaurant_id | grades.0.grade | grades.1.grade | grades.2.grade | grades.3.grade | grades.4.grade |
---|---|---|---|---|---|
123456780 | A | A | A | A | A |
123456781 | B | B | B | B | B |
123456782 | C | C | C | C | C |
CData Software
With the drivers by CData Software, you can simply submit a free-form query to request the data as described above. The drivers use dot-notation to interpret requests for individual array objects and fields within sub-documents:
SELECT [restaurant_id], [grades.0.grade], [grades.1.grade], [grades.2.grade], [grades.3.grade], [grades.4.grade] FROM [restaurants]
Competitor 1
The Competitor 1 drivers interpret arrays of documents as Virtual Tables, meaning users must perform JOIN queries in order to retrieve both restaurant and grades data. This interpretation of the data means that there is no simple way to retrieve the grades for a given restaurant in a single row.
Competitor 2
The Competitor 2 drivers also interpret arrays of documents as separate tables. As with the Competitor 1 drivers, this means users must perform JOIN queries in order to retrieve both restaurant and grades data, preventing a simple way to retrieve the grades for a given restaurant in a single row.
Related Articles
- NoSQL Drivers: Performance Comparison - Compare how drivers from different vendors perform when querying and processing large datasets from NoSQL data sources.
- CData NoSQL Overview - Learn about the features that set our Drivers technologies apart when working with NoSQL.