The True Value of Data Virtualization: Beyond Buzzwords
In recent years, the data integration and management industry has witnessed an increasing interest in data virtualization. While proponents of the technology understand its true potential, many potential adopters remain swayed by marketing buzzwords, often leading to a skewed perception of its real benefits.
The marketing view vs. the practitioner’s perspective
Data virtualization, at its core, aims to provide businesses with a seamless way to access and manage data from various sources. Yet, the way it is marketed still largely highlights the concepts easily digestible for the business side, such as real-time data access and the promise of not having to move or store additional data, thus saving on costs and complexity.
While these benefits can be appealing to business or high-level decision-makers, and certainly also have their value, they don’t capture the full picture for those deeply involved in data integration implementation.
For instance, while real-time data is undoubtedly essential, it only represents a small fraction of the overall data integration use case. Historical data analysis, which often doesn’t require real-time access, plays a pivotal role for many businesses.
Furthermore, the notion of “not moving data” may not resonate with experienced data practitioners. There are several legitimate scenarios where data movement is justified, be it for business purposes like data cleansing or technical reasons like using a robust database for computation.
The under-appreciated advantage: Declarative approach
So, what then is the most central advantage of data virtualization?
It’s the declarative approach. Traditional data integration methods require specifying every step data should go through to be integrated. This step-by-step approach can be tedious and prone to errors, particularly if requirements change, leading to adaptations at multiple steps.
In contrast, data virtualization uses a declarative method, i.e. a method that abstracts away the control flow for logic required for software to perform an action. Instead of defining each step, practitioners describe the desired result and the software builds the steps to achieve that result. If you want a different result, change the description, and the intermediate steps adjust accordingly. This approach is primarily what can accelerate processes up to five times.
It’s akin to using SQL databases, where you describe the outcome and let the database handle the intricate process. Imagine a world without SQL databases, where each query demands manual programming of all the individual steps in the query plan, and reprogramming them when the query requirements change — sounds like a horror scenario to any database person, doesn’t it? However, this is how things are still largely done in the ETL/ELT-centric world. This is what data virtualization solves and that’s the main difference data virtualization makes in data integration.
Conclusion
While buzzwords like “real-time” and “no data movement” might dominate the narrative around data virtualization, it’s vital to delve deeper to appreciate its full potential. By understanding its core benefits, especially the declarative approach, businesses can truly leverage the technology’s capabilities, resulting in more efficient and effective data integration strategies.