Using DirectQuery in Power BI with CData Power BI Connectors



Many companies are pursuing digital transformation (DX) and data utilization, and there are an increasing number of cases where data from various SaaS platforms, core systems, and databases is integrated into a data warehouse (DWH) to gain insights that drive decision-making and improve business performance. Some companies may also be leveraging data marts and data lakes.

However, once data is consolidated in a DWH, businesses can run into issues when importing large datasets into a BI tool, such as Power BI, for analysis. As data volume increases, the traditional import and transformation methods take too much time, making it difficult or impossible to use for analysis.

In this article we explore Power BI's DirectQuery connection mode and walkthrough examples using the CData Power BI Connector to eliminate the trouble of importing large amounts of data. We use Google BigQuery as the data source, but similar functionality is essential for Power BI integration with any DWH.

Use real-time queries

Most BI and analytics tools provide a way to integrate and query data in real-time from data sources without having to import the data locally, in Power BI that is supported by the "DirectQuery" connection mode. Real-time integration has the following advantages:

  • You can create reports, visualizations, and dashboards from large data sets where it would be difficult (or impossible) to import all of the data first.
  • Changes in reference data can be reflected in near real time. Re-importing data one by one would require a significant amount of time and resources.
  • You can work around tool-specific dataset size limits. For example, Power BI only allows you to import a dataset up to 1GB in size.

CData drivers support real-time access to data regardless of the tool, platform, or application. In particular, CData Power BI Connectors support real-time integration with any data source using DirectQuery.

Use DirectQuery with Google BigQuery in Power BI

Google BigQuery is a fully managed cloud-based data warehouse that can be used on Google Cloud (GCP). It provides cloud-based service that has high performance, pay-as-you-go pricing, and strong integration with other Google ecosystems such as Google Analytics and Google Ads.

With built-in support for DirectQuery, Power BI's real-time query capability, the CData Power BI Connector enables real-time connectivity to large BigQuery datasets from Power BI reports and dashboards.

Considering Power BI's 1GB dataset size limit and the resources required to import and process data, DirectQuery is the only option for dealing with large datasets. In this article, we demonstrate real-time integration from Power BI to BigQuery using a dataset from the New York City Taxi & Limousine Commission (NYC-TLC), specifically yellow taxi trip data.

Here are the details of the tables used:

Taxi travel table details

  • Table ID: nyc-tlc:yellow.trips
  • Table size: 139.29 GB
  • Row count: 1,108,779,463

As you can see, this table has over 1 billion rows of data, and the table size easily exceeds the 1GB import limit for Power BI. Importing the entire dataset into Power BI for analysis would be impossible and would be prohibitively costly in terms of time and resources in any other tool or platform. So let's connect to this data using the CData Power BI connector and DirectQuery.

To connect to the data, set the Project Id and Dataset Id in the CData Power BI Connector. The following is the Power BI Connector DSN setting screen:

Create a visualization

By connecting to data in real-time, you can create visualizations quickly even from large data sets because many of the complex operations like filters, aggregations, etc. can be performed on the original data source. By passing complex queries to the data source, you can leverage server-side computational resources, speeding up reporting while conserving client-side resources. You can create rich visualizations that update in real time as filters change, drill-downs, etc.

Number of trips by taxi company

First, we created a pie chart showing the number of trips for each taxi company in the dataset. When importing data, more than 1 billion rows of trip data would have to be imported, grouped, and counted on the client side. On the other hand, with DirectQuery, you can group and count on BigQuery and upload the results to Power BI in real time. Thanks to this feature, what was originally 1 billion rows can now be visualized on the Power BI side using only three rows of data (one row for each company) that have been processed and grouped in BigQuery.

Number of passengers and average total fare per taxi company

The next visualization groups each trip by number of passengers and company, and calculates the average total fare. It is more complicated because it filters for trips with 0-10 passengers. Again, rather than relying on local resources to handle 1 billion rows, Power BI can let BigQuery do the averaging, grouping, filtering, and ordering. Even with all this complexity, Power BI only needs to graph 33 rows of data (3 companies x 11 passengers).

Taxi company and drop-off situation by location

The final visualization shows the drop-off locations (based on latitude and longitude) for each trip, color-coded by taxi company. The visualization uses a very simple query, so the result set is quite large.

Full Dashboard

With live queries, you can filter all previous visualizations in real time whenever you drill down into the data on other visualizations. For example, you can apply the same filter to other visualizations by clicking on the bar representing the VTS trips with 5 passengers. Filters for each visualization are added to the query, and Power BI requests real-time data from BigQuery, with results returned within seconds.

Free Trial & More Information

CData Power BI Connectors allow you to use DirectQuery with popular data stores and data warehouses, including BigQuery. With DirectQuery support, you can run real-time reports using any Power BI visualization, from pie charts and bar charts to maps and scatter plots, regardless of the size of the original dataset.

Although this article uses BigQuery, the techniques and steps described here apply to any of the 250 SaaS, Big Data, or NoSQL data sources supported by CData.

All Power BI Connectors offer free 30-day trials, so download a trial today and get started with real-time integration with big data.