We are proud to share our inclusion in the 2024 Gartner Magic Quadrant for Data Integration Tools. We believe this recognition reflects the differentiated business outcomes CData delivers to our customers.
Get the Report →JSON Drivers: Parsing Hierarchical Data
Modern services return or store data as JSON objects, arrays, or any combination thereof. While this is convenient for storing hierarchical data, it can be difficult to work with in common BI, reporting, and ETL tools. There are two prevailing techniques for dealing with nested JSON data:
- Using horizontal and vertical flattening to drill down into the nested arrays and objects
- Parsing the data structure and building a relational model based on the existing hierarchy
This article will describe the differences between the two techniques and how to configure the driver to use either of the techniques. In another Knowledge Base article, we explore more deeply how JSON data can be interpreted using horizontal and vertical flattening.
Document, Flattened Documents, and Relational Modeling
The CData Drivers for JSON manage NoSQL data by reporting full documents, implicitly JOINing flattened documents, or building a relational model. When reporting full documents, the driver will return nested object arrays as aggregated JSON objects. When working with flattened documents, the driver will interpret nested array objects as a single table, allowing you to implicitly perform a JOIN by running a SELECT query. When building a relational model, the driver will interpret the results as individual tables based on the discovered objects and structures, allowing you see a relational model of the NoSQL data and perform explicit SQL JOIN queries on the data. Follow along below for examples of each technique.
- Document: A top-level, document view of your NoSQL data, including aggregated arrays.
- Flattened Documents: Nested array objects are implicitly JOINed to parent objects.
- Relational Modeling: All of your NoSQL data viewed as individual tables.
For reference, these examples are based on the following sample JSON document.
Sample Document
The sample document includes entries for people, the cars they own, and various maintenance services performed on those cars.
people.json
{
"people": [
{
"personal": {
"age": 20,
"gender": "M",
"name": {
"first": "John",
"last": "Doe"
}
},
"vehicles": [
{
"type": "car",
"model": "Honda Civic",
"insurance": {
"company": "ABC Insurance",
"policy_num": "12345"
},
"maintenance": [
{
"date": "07-17-2017",
"desc": "oil change"
},
{
"date": "01-03-2018",
"desc": "new tires"
}
]
},
{
"type": "truck",
"model": "Dodge Ram",
"insurance": {
"company": "ABC Insurance",
"policy_num": "12345"
},
"maintenance": [
{
"date": "08-27-2017",
"desc": "new tires"
},
{
"date": "01-08-2018",
"desc": "oil change"
}
]
}
],
"source": "internet"
},
{
"personal": {
"age": 24,
"gender": "F",
"name": {
"first": "Jane",
"last": "Roberts"
}
},
"vehicles": [
{
"type": "car",
"model": "Toyota Camry",
"insurance": {
"company": "Car Insurance",
"policy_num": "98765"
},
"maintenance": [
{
"date": "05-11-2017",
"desc": "tires rotated"
},
{
"date": "11-03-2017",
"desc": "oil change"
}
]
},
{
"type": "car",
"model": "Honda Accord",
"insurance": {
"company": "Car Insurance",
"policy_num": "98765"
},
"maintenance": [
{
"date": "10-07-2017",
"desc": "new air filter"
},
{
"date": "01-13-2018",
"desc": "new brakes"
}
]
}
],
"source": "phone"
}
]
}
Document Modeling
Using a top-level document view of NoSQL data provides ready access to top-level fields and objects, forgoing the time and resources to process and parse hierarchical arrays. You can configure CData drivers to view a JSON store based on the top-most repeated element and treat all nested arrays as single columns. In this mode, the driver uses streaming to read the JSON data, only parsing the returned data once per query.
Below is a sample query and the results, based on the sample document above. The query results in a single "people" table based on the JSON path "$.people".
Connection String
Set the Data Model connection property to "Document" and set the JSON Path connection property to "$.people" to perform the query above and see the example result set.
DataModel=Document;JSONPath='$.people';
Metadata
The table below describes the metadata based on using the Document data model.
People | |
---|---|
Column | Data Type |
personal.age | Integer |
personal.gender | String |
personal.name.first | String |
personal.name.last | String |
source | String |
vehicles | String |
Query
In this query, we pull top-level object elements and an array into our results. The top-level object elements are available due to default object flattening. The array is returned as aggregated JSON.
SELECT [personal.age] AS age, [personal.gender] AS gender, [personal.name.first] AS name_first, [personal.name.last] AS name_last, [source], [vehicles] FROM [people]
Results
With a document view of the data, the "personal" object is flattened into 4 columns and the "source" and "vehicles" elements are returned as individual columns, resulting in a table with 6 columns.
age | gender | name_first | name_last | source | vehicles |
---|---|---|---|---|---|
20 | M | John | Doe | internet | [{"type":"car","model":"Honda Civic","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"07-17-2017","desc":"oil change"},{"date":"01-03-2018","desc":"new tires"}]},{"type":"truck","model":"Dodge Ram","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"08-27-2017","desc":"new tires"},{"date":"01-08-2018","desc":"oil change"}]}] |
24 | F | Jane | Roberts | phone | [{"type":"car","model":"Toyota Camry","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"05-11-2017","desc":"tires rotated"},{"date":"11-03-2017","desc":"oil change"}]},{"type":"car","model":"Honda Accord","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"10-07-2017","desc":"new air filter"},{"date":"01-13-2018","desc":"new brakes"}]}] |
Benefits & Considerations
With the document model, you are able to see all of the top-level data in a JSON store or service, along with the aggregated array data, in a single table. Your are able to submit simple queries to work with the top-level data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. One consideration is your need for any data stored in an array and the ability of your tool or application to process JSON arrays in a meaningful way.
Flattened Documents Modeling
For users who simply need access to the entirety of their JSON data, flattening the data into a single table is the best option. You can configure the driver to parse a single table from the JSON data, based on JSON paths in the data. In this mode, nested object arrays are treated as separate tables, but implicitly JOINed to the parent table. The driver uses streaming and only parses the JSON data once per query. With the flattened documents model, you can perform implicit JOIN statements on the data using dot notation to drill down into nested elements in the JSON data.
Below is a sample query and the results, based on the sample document above, parsing based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance" (this implicitly JOINs the "people" collection with the "vehicles" collection and implicitly JOINs the "vehicles" collection with the "maintenance" collection).
Connection String
Set the Data Model connection property to "FlattenedDocuments" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.
DataModel=FlattenedDocuments;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'
Metadata
The table below describes the meta-data based on using the Flattened Documents data model.
People | |
---|---|
Column | Data Type |
people:_id | String |
personal.age | Integer |
personal.gender | String |
personal.name.first | String |
personal.name.last | String |
source | String |
vehicle:_id | String |
type | String |
model | String |
insurance.company | String |
insurance.policy_num | String |
maintenance:_id | String |
date | Date |
desc | String |
Query
In this query, we are able to drill into the nested elements in each "people" object. Since we included the "vehicles" collection as a JSON path, we can query the element of a "vehicle" explicitly.
SELECT [personal.age] AS age, [personal.gender] AS gender, [personal.name.first] AS name_first, [personal.name.last] AS name_last, [source], [type], [model], [insurance.company] AS ins_company, [insurance.policy_num] AS ins_policy_num, [date] AS maint_date, [desc] AS maint_desc FROM [people]
Results
With horizontal and vertical flattening based on the described paths, each "vehicle" object is implicitly JOINed to its parent "people" object and each "maintenance" object is implicitly JOINed to its parent "vehicle" object to produce a table with 8 rows (2 "maintenance" objects each for 2 "vehicles" and 2 "vehicles" each for 2 "people").
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc |
---|---|---|---|---|---|---|---|---|---|---|
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Benefits & Considerations
With flattened documents, you are able to see all of the data in a JSON store or service in a single table. You are able to submit simple queries to drill down into the hierarchical data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. When working with flattened documents, users need to consider whether the tool or application you are working with works better with discrete entities versus a single pre-JOINed dataset.
Relational Modeling
The CData drivers can be configured to create a relational model of the data in the JSON file or source, treating nested object arrays as individual tables, including relationships to parent tables. This is particularly useful if you need to work with your JSON data in existing BI, reporting, and ETL tools that expect a relational data model. The model interpreted is based on the JSON paths in the data for each object array you wish to see as a table. If you build a relational model, then any time you perform a JOIN query, the JSON file or source will be queried once for each "table" included in the query.
Below is a sample query and the results, based on the sample document above, using a relational model based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance".
Connecting String
Set the Data Model connection property to "Relational" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.
DataModel=Relational;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'
Metadata
The tables below describe the metadata based on using the Relational data model.
People | |
---|---|
Column | Data Type |
_id | String |
personal.age | Integer |
personal.gender | String |
personal.name.first | String |
personal.name.last | String |
source | String |
Vehicles | |
---|---|
Column | Data Type |
_id | String |
insurance.company | String |
insurance.policy_num | String |
model | String |
type | String |
Maintenance | |
---|---|
Column | Data Type |
_id | String |
date | Date |
desc | String |
Query
In this query, we explicitly JOIN the "people", "vehicles", and "maintenance" tables.
SELECT [people].[personal.age] AS age, [people].[personal.gender] AS gender, [people].[personal.name.first] AS first_name, [people].[personal.name.last] AS last_name, [people].[source], [vehicles].[type], [vehicles].[model], [vehicles].[insurance.company] AS ins_company, [vehicles].[insurance.policy_num] AS ins_policy_num, [maintenance].[date] AS maint_date, [maintenance].[desc] AS maint_desc FROM [people] JOIN [vehicles] ON [people].[_id] = [vehicles].[people_id] JOIN [maintenance] ON [vehicles].[_id] = [maintenance].[vehicles_id]
Results
Using a relational model, any JOINs are controlled by the query. In this case, each "maintenance" object is JOINed to its parent "vehicle" object, which is JOINed to its parent "people" object to produce a table with 8 rows (2 "maintenance" entries for each of 2 "vehicles" each for 2 "people").
age | gender | first_name | last_name | source | type | model | ins_company | ins_policy_num | maint_date | maint_desc |
---|---|---|---|---|---|---|---|---|---|---|
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2017-07-17 | oil change |
20 | M | John | Doe | internet | car | Honda Civic | ABC Insurance | 12345 | 2018-01-03 | new tires |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2017-08-27 | new tires |
20 | M | John | Doe | internet | truck | Dodge Ram | ABC Insurance | 12345 | 2018-01-08 | oil change |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-05-11 | tires rotated |
24 | F | Jane | Roberts | phone | car | Toyota Camry | Car Insurance | 98765 | 2017-11-03 | oil change |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2017-10-07 | new air filter |
24 | F | Jane | Roberts | phone | car | Honda Accord | Car Insurance | 98765 | 2018-01-13 | new brakes |
Benefits & Considerations
With relational modeling, you are able to build a model of your data based on the discrete entities in the JSON data or service, which provides better compatibility with some BI, reporting, and ETL tools. However, in order to build and query a relational model, you will sacrifice some performance. If you are working with data from several tables in the relational model, the driver will query and parse data for each table in the query. For example, the query above requires three separate requests.
Related Articles
- CData NoSQL Overview - Learn about the features that set our Drivers technologies apart when working with NoSQL.
- NoSQL Drivers: Performance Comparison - Compare how drivers from different vendors perform when querying and processing large datasets from NoSQL data sources.
- NoSQL Drivers: Feature Comparison - Compare how drivers from different vendors handle complex queries and complex datasets from NoSQL data sources.