JSON Drivers: Parsing Hierarchical Data

Modern services return or store data as JSON objects, arrays, or any combination thereof. While this is convenient for storing hierarchical data, it can be difficult to work with in common BI, reporting, and ETL tools. There are two prevailing techniques for dealing with nested JSON data:

Using horizontal and vertical flattening to drill down into the nested arrays and objects
Parsing the data structure and building a relational model based on the existing hierarchy

This article will describe the differences between the two techniques and how to configure the driver to use either of the techniques. In another Knowledge Base article, we explore more deeply how JSON data can be interpreted using horizontal and vertical flattening.

Document, Flattened Documents, and Relational Modeling

The CData Drivers for JSON manage NoSQL data by reporting full documents, implicitly JOINing flattened documents, or building a relational model. When reporting full documents, the driver will return nested object arrays as aggregated JSON objects. When working with flattened documents, the driver will interpret nested array objects as a single table, allowing you to implicitly perform a JOIN by running a SELECT query. When building a relational model, the driver will interpret the results as individual tables based on the discovered objects and structures, allowing you see a relational model of the NoSQL data and perform explicit SQL JOIN queries on the data. Follow along below for examples of each technique.

Document: A top-level, document view of your NoSQL data, including aggregated arrays.
Flattened Documents: Nested array objects are implicitly JOINed to parent objects.
Relational Modeling: All of your NoSQL data viewed as individual tables.

For reference, these examples are based on the following sample JSON document.

Sample Document

The sample document includes entries for people, the cars they own, and various maintenance services performed on those cars.

people.json

{
  "people": [
    {
      "personal": {
        "age": 20,
        "gender": "M",
        "name": {
          "first": "John",
          "last": "Doe"
        }
      },
      "vehicles": [
        {
          "type": "car",
          "model": "Honda Civic",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "07-17-2017",
              "desc": "oil change"
            },
            {
              "date": "01-03-2018",
              "desc": "new tires"
            }
          ]
        },
        {
          "type": "truck",
          "model": "Dodge Ram",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "08-27-2017",
              "desc": "new tires"
            },
            {
              "date": "01-08-2018",
              "desc": "oil change"
            }
          ]
        }
      ],
      "source": "internet"
    },
    {
      "personal": {
        "age": 24,
        "gender": "F",
        "name": {
          "first": "Jane",
          "last": "Roberts"
        }
      },
      "vehicles": [
        {
          "type": "car",
          "model": "Toyota Camry",
          "insurance": {
            "company": "Car Insurance",
            "policy_num": "98765"
          },
          "maintenance": [
            {
              "date": "05-11-2017",
              "desc": "tires rotated"
            },
            {
              "date": "11-03-2017",
              "desc": "oil change"
            }
          ]
        },
        {
          "type": "car",
          "model": "Honda Accord",
          "insurance": {
            "company": "Car Insurance",
            "policy_num": "98765"
          },
          "maintenance": [
            {
              "date": "10-07-2017",
              "desc": "new air filter"
            },
            {
              "date": "01-13-2018",
              "desc": "new brakes"
            }
          ]
        }
      ],
      "source": "phone"
    }
  ]
}

Document Modeling

Using a top-level document view of NoSQL data provides ready access to top-level fields and objects, forgoing the time and resources to process and parse hierarchical arrays. You can configure CData drivers to view a JSON store based on the top-most repeated element and treat all nested arrays as single columns. In this mode, the driver uses streaming to read the JSON data, only parsing the returned data once per query.

Below is a sample query and the results, based on the sample document above. The query results in a single "people" table based on the JSON path "$.people".

Connection String

Set the Data Model connection property to "Document" and set the JSON Path connection property to "$.people" to perform the query above and see the example result set.

DataModel=Document;JSONPath='$.people';

Metadata

The table below describes the metadata based on using the Document data model.

People
Column	Data Type
personal.age	Integer
personal.gender	String
personal.name.first	String
personal.name.last	String
source	String
vehicles	String

Query

In this query, we pull top-level object elements and an array into our results. The top-level object elements are available due to default object flattening. The array is returned as aggregated JSON.

SELECT 
  [personal.age] AS age, 
  [personal.gender] AS gender, 
  [personal.name.first] AS name_first, 
  [personal.name.last] AS name_last, 
  [source], 
  [vehicles]
FROM 
  [people]

Results

With a document view of the data, the "personal" object is flattened into 4 columns and the "source" and "vehicles" elements are returned as individual columns, resulting in a table with 6 columns.

age	gender	name_first	name_last	source	vehicles
20	M	John	Doe	internet	[{"type":"car","model":"Honda Civic","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"07-17-2017","desc":"oil change"},{"date":"01-03-2018","desc":"new tires"}]},{"type":"truck","model":"Dodge Ram","insurance":{"company":"ABC Insurance","policy_num":"12345"},"maintenance":[{"date":"08-27-2017","desc":"new tires"},{"date":"01-08-2018","desc":"oil change"}]}]
24	F	Jane	Roberts	phone	[{"type":"car","model":"Toyota Camry","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"05-11-2017","desc":"tires rotated"},{"date":"11-03-2017","desc":"oil change"}]},{"type":"car","model":"Honda Accord","insurance":{"company":"Car Insurance","policy_num":"98765"},"maintenance":[{"date":"10-07-2017","desc":"new air filter"},{"date":"01-13-2018","desc":"new brakes"}]}]

Benefits & Considerations

With the document model, you are able to see all of the top-level data in a JSON store or service, along with the aggregated array data, in a single table. Your are able to submit simple queries to work with the top-level data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. One consideration is your need for any data stored in an array and the ability of your tool or application to process JSON arrays in a meaningful way.

Flattened Documents Modeling

For users who simply need access to the entirety of their JSON data, flattening the data into a single table is the best option. You can configure the driver to parse a single table from the JSON data, based on JSON paths in the data. In this mode, nested object arrays are treated as separate tables, but implicitly JOINed to the parent table. The driver uses streaming and only parses the JSON data once per query. With the flattened documents model, you can perform implicit JOIN statements on the data using dot notation to drill down into nested elements in the JSON data.

Below is a sample query and the results, based on the sample document above, parsing based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance" (this implicitly JOINs the "people" collection with the "vehicles" collection and implicitly JOINs the "vehicles" collection with the "maintenance" collection).

Connection String

Set the Data Model connection property to "FlattenedDocuments" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.

DataModel=FlattenedDocuments;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'

Metadata

The table below describes the meta-data based on using the Flattened Documents data model.

People
Column	Data Type
people:_id	String
personal.age	Integer
personal.gender	String
personal.name.first	String
personal.name.last	String
source	String
vehicle:_id	String
type	String
model	String
insurance.company	String
insurance.policy_num	String
maintenance:_id	String
date	Date
desc	String

Query

In this query, we are able to drill into the nested elements in each "people" object. Since we included the "vehicles" collection as a JSON path, we can query the element of a "vehicle" explicitly.

SELECT 
  [personal.age] AS age, 
  [personal.gender] AS gender, 
  [personal.name.first] AS name_first, 
  [personal.name.last] AS name_last, 
  [source], 
  [type], 
  [model], 
  [insurance.company] AS ins_company, 
  [insurance.policy_num] AS ins_policy_num,
  [date] AS maint_date, 
  [desc] AS maint_desc
FROM 
  [people]

Results

With horizontal and vertical flattening based on the described paths, each "vehicle" object is implicitly JOINed to its parent "people" object and each "maintenance" object is implicitly JOINed to its parent "vehicle" object to produce a table with 8 rows (2 "maintenance" objects each for 2 "vehicles" and 2 "vehicles" each for 2 "people").

age	gender	first_name	last_name	source	type	model	ins_company	ins_policy_num	maint_date	maint_desc
20	M	John	Doe	internet	car	Honda Civic	ABC Insurance	12345	2017-07-17	oil change
20	M	John	Doe	internet	car	Honda Civic	ABC Insurance	12345	2018-01-03	new tires
20	M	John	Doe	internet	truck	Dodge Ram	ABC Insurance	12345	2017-08-27	new tires
20	M	John	Doe	internet	truck	Dodge Ram	ABC Insurance	12345	2018-01-08	oil change
24	F	Jane	Roberts	phone	car	Toyota Camry	Car Insurance	98765	2017-05-11	tires rotated
24	F	Jane	Roberts	phone	car	Toyota Camry	Car Insurance	98765	2017-11-03	oil change
24	F	Jane	Roberts	phone	car	Honda Accord	Car Insurance	98765	2017-10-07	new air filter
24	F	Jane	Roberts	phone	car	Honda Accord	Car Insurance	98765	2018-01-13	new brakes

Benefits & Considerations

With flattened documents, you are able to see all of the data in a JSON store or service in a single table. You are able to submit simple queries to drill down into the hierarchical data. Any given query will result in a single request to read and parse the JSON data, which means faster performance and better compatibility with streaming functionality. When working with flattened documents, users need to consider whether the tool or application you are working with works better with discrete entities versus a single pre-JOINed dataset.

Relational Modeling

The CData drivers can be configured to create a relational model of the data in the JSON file or source, treating nested object arrays as individual tables, including relationships to parent tables. This is particularly useful if you need to work with your JSON data in existing BI, reporting, and ETL tools that expect a relational data model. The model interpreted is based on the JSON paths in the data for each object array you wish to see as a table. If you build a relational model, then any time you perform a JOIN query, the JSON file or source will be queried once for each "table" included in the query.

Below is a sample query and the results, based on the sample document above, using a relational model based on the JSON paths "$.people", "$.people.vehicles", and "$.people.vehicles.maintenance".

Connecting String

Set the Data Model connection property to "Relational" and set the JSON Path connection property to "$.people;$.people.vehicles;$.people.vehicles.maintenance;" to perform the query above and see the example result set.

DataModel=Relational;JSONPath='$.people;$.people.vehicles;$.people.vehicles.maintenance;'

Metadata

The tables below describe the metadata based on using the Relational data model.

People
Column	Data Type
_id	String
personal.age	Integer
personal.gender	String
personal.name.first	String
personal.name.last	String
source	String

Vehicles
Column	Data Type
_id	String
insurance.company	String
insurance.policy_num	String
model	String
type	String

Maintenance
Column	Data Type
_id	String
date	Date
desc	String

Query

In this query, we explicitly JOIN the "people", "vehicles", and "maintenance" tables.

SELECT 
  [people].[personal.age] AS age, 
  [people].[personal.gender] AS gender, 
  [people].[personal.name.first] AS first_name, 
  [people].[personal.name.last] AS last_name, 
  [people].[source], 
  [vehicles].[type], 
  [vehicles].[model], 
  [vehicles].[insurance.company] AS ins_company, 
  [vehicles].[insurance.policy_num] AS ins_policy_num, 
  [maintenance].[date] AS maint_date, 
  [maintenance].[desc] AS maint_desc
FROM 
  [people]
JOIN 
  [vehicles] 
ON 
  [people].[_id] = [vehicles].[people_id]
JOIN 
  [maintenance] 
ON 
  [vehicles].[_id] = [maintenance].[vehicles_id]

Results

Using a relational model, any JOINs are controlled by the query. In this case, each "maintenance" object is JOINed to its parent "vehicle" object, which is JOINed to its parent "people" object to produce a table with 8 rows (2 "maintenance" entries for each of 2 "vehicles" each for 2 "people").

age	gender	first_name	last_name	source	type	model	ins_company	ins_policy_num	maint_date	maint_desc
20	M	John	Doe	internet	car	Honda Civic	ABC Insurance	12345	2017-07-17	oil change
20	M	John	Doe	internet	car	Honda Civic	ABC Insurance	12345	2018-01-03	new tires
20	M	John	Doe	internet	truck	Dodge Ram	ABC Insurance	12345	2017-08-27	new tires
20	M	John	Doe	internet	truck	Dodge Ram	ABC Insurance	12345	2018-01-08	oil change
24	F	Jane	Roberts	phone	car	Toyota Camry	Car Insurance	98765	2017-05-11	tires rotated
24	F	Jane	Roberts	phone	car	Toyota Camry	Car Insurance	98765	2017-11-03	oil change
24	F	Jane	Roberts	phone	car	Honda Accord	Car Insurance	98765	2017-10-07	new air filter
24	F	Jane	Roberts	phone	car	Honda Accord	Car Insurance	98765	2018-01-13	new brakes

Benefits & Considerations

With relational modeling, you are able to build a model of your data based on the discrete entities in the JSON data or service, which provides better compatibility with some BI, reporting, and ETL tools. However, in order to build and query a relational model, you will sacrifice some performance. If you are working with data from several tables in the relational model, the driver will query and parse data for each table in the query. For example, the query above requires three separate requests.

CData NoSQL Overview - Learn about the features that set our Drivers technologies apart when working with NoSQL.
NoSQL Drivers: Performance Comparison - Compare how drivers from different vendors perform when querying and processing large datasets from NoSQL data sources.
NoSQL Drivers: Feature Comparison - Compare how drivers from different vendors handle complex queries and complex datasets from NoSQL data sources.

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Connect With Us

Get Started

Data Connectors

ETL/ ELT Solutions

Cloud & API Connectivity

OEM & Custom Drivers

Connect With Us

Get Started

Data Visualization

Company

Resources

In this article

Related articles

JSON Drivers: Parsing Hierarchical Data

Document, Flattened Documents, and Relational Modeling

Sample Document

people.json

Document Modeling

Connection String

Metadata

Query

Results

Benefits & Considerations

Flattened Documents Modeling

Connection String

Metadata

Query

Results

Benefits & Considerations

Relational Modeling

Connecting String

Metadata

Query

Results

Benefits & Considerations

Related Articles