by CData Software | November 14, 2023

Performance Optimization enabled in the CData Virtuality

cdata virtuality

One of the predominant concerns expressed by enterprises considering data virtualization revolves around performance. Given its inherent nature of establishing real-time connections, such worries are entirely justifiable.

To address this, CData Virtuality has integrated a three-level performance optimization mechanism, a step ahead of the commonly found two-level optimization in other data virtualization tools.

The three levels are:

  1. Distributed query optimization: This leverages advanced techniques like pushdowns and optimized join algorithms.
  2. Caching: This functions both in-memory and on-disk.
  3. Self-learning recommended optimization (materialization): This proposes the materialization of tables/views for enhanced optimization.

In the following, we will delve deeper into the different layers and comprehend how they bolster performance.

Distributed query optimization

All queries entering the CData Virtuality engine undergo transformations to enhance their performance using distributed query optimization. Here’s a breakdown of the primary processes involved:

  • Rewriting SQL: As a foundational step, queries undergo a refinement process to simplify expressions and criteria, ensuring that the base SQL is optimized for maximum efficiency.
  • Logical plan optimization: Once the SQL is rewritten, the queries are turned into a logical plan. The CData Virtuality Server uses special optimization rules to look closely at the query’s structure and the size of the data. It also considers detailed cost information to improve its decisions, helping to use techniques like pushdowns.
  • Processing plan conversion: Subsequent to the logical plan optimization, this plan is transmuted into an actionable format. Within this layout, nodes symbolize fundamental processing actions, steering the query’s execution across the distributed framework.

Caching

Recognizing the constraints of scalability in data virtualization, especially with expansive datasets or a high number of users, CData Virtuality taps into caching to improve query performance. Caching significantly boosts performance for small datasets, yet its effectiveness for larger datasets diminishes rapidly, providing limited control over data loading and storage.

Self-learning recommended optimization (materialization)

The distinctive part of CData Virtuality’s optimization engine is data materialization with self-learning capabilities. It learns from the query behavior of data consumers and addresses performance issues by autonomously creating and managing the physical data structures of either:

  1. The external data sources or
  2. The internal virtual views in user-defined analytical storage

Further, this self-learning recommendation optimization suggests indexes for the materialized tables. Once data is physically stored in the analytical storage, any slow-performing segments of a query are seamlessly redirected to this optimized data, eliminating the need for report rewriting.

To ensure the data in analytical storage remains updated, periodic materialization tasks are executed. Incremental materializations, which capture only the new or changed data, are also on offer, thereby reducing the amount of data to be materialized.

The advanced data virtualization experience

Data virtualization is a dynamic technology, and performance optimization is crucial for enterprises to leverage its full potential. CData Virtuality’s three-tiered approach to performance optimization ensures a comprehensive solution, addressing multiple aspects of the performance challenges. Whether you’re dealing with large datasets, numerous users, complex query structures, or slow databases and/or slow network, the platform’s materialization capabilities and optimization features are designed to maximize efficiency.