From Silos to Speed: Unified Data Access with Databricks Lakehouse Federation and Virtualization

Accelerating time-to-insights with unified, self-service access across divisions—powered by Databricks, Microsoft Fabric, and CData.

Faster insights through self-service access

Business users across divisions can securely explore the data without IT bottlenecks—delivering real-time insights in minutes, not weeks.

Unified support for analytical and operational use cases

One federated architecture powers real-time plant-level decisions and strategic analytics—eliminating silos between execution and leadership.

Seamless integration across the modern data ecosystem

CData seamlessly integrates with Databricks and Microsoft Fabric, enabling governed, flexible access to distributed systems without added complexity.

Company

Steel Manufacturer

Industry

Manufacturing

Country

United States

Product

Snapshot

By extending their Databricks and Microsoft Fabric-based data architecture with CData’s advanced virtualization capabilities, the customer can enable governed, real-time access to distributed data—accelerating insights and enabling true self-service.

The challenge: Fragmented data landscape slowed enterprise-wide insight and cross-team access

A large U.S.-based steel manufacturer was challenged with a fragmented data landscape, where each division operated its own systems—Oracle, SAP, Microsoft Dynamics, and various legacy platforms. This siloed setup made it difficult to access, compare, or analyze information across the enterprise. Business users, in particular, were unable to access the data they needed for timely analysis and decision-making. Instead, they relied heavily on central IT or manual data preparation, resulting in delays and limited agility. The lack of self-service capabilities significantly slowed down insight generation across the organization.

Compounding the challenge, analytical and operational use cases were handled separately, with little to no integration between them. While local teams could build analytical outputs based on their own systems, corporate teams often had no visibility into those insights—or vice versa. As a result, cross-functional collaboration and data reuse were severely limited.

The impact was felt at every level:

Corporate finance struggled to generate consolidated reports
Sales teams lacked reliable access to historical performance data
Bonus calculations for production staff were delayed
AI-driven safety applications couldn’t be reliably fed with the right data
Global leadership had limited visibility into cross-divisional performance

These problems were not only technical—they were cultural. Many teams remained deeply rooted in traditional ETL processes, and data virtualization was a new and often misunderstood concept. This added resistance to change and slowed the adoption of more modern, agile data approaches.

Driving change across business and IT

The Senior Data Architect led the internal shift, focusing on both mindset and architecture. The organization didn’t just need new technology—it needed a new way of thinking about data access.

Through hands-on whiteboarding sessions, collaborative design workshops, and persistent executive alignment, they reframed data virtualization as a strategic enabler—not just a technical solution. The message was simple and practical:Self-service access. Faster insights. One architecture for analytical and operational needs.

By translating the benefits of federation into business outcomes—agility, governance, and self-service—stakeholders across IT and business functions gradually aligned around the vision.

The solution: A modern data architecture built for speed, self-service, and flexibility

To modernize data access and reduce reliance on rigid pipelines, the organization adopted CData as the backbone of its data federation strategy. Seamlessly integrated with Databricks Lakehouse Federation and Microsoft Fabric, the solution delivered a virtual semantic layer that replaced slow ETL processes with agile, on-demand data access–supporting analytical as well as operational use cases.

Key architecture highlights:

Virtual schemas integrated into Databricks Unity Catalog, making virtualized data sets instantly discoverable and queryable within the Unity Catalog environment
Metadata-driven pipelines eliminated manual spreadsheet-driven processes
CDC-style micro-batching fed operational reporting and AI use cases in near real time
Delta Lake as a unified format, accessible via Microsoft Power BI and Fabric
Enterprise data governance through integration with Unity Catalog and downstream Informatica tools

Why CData was the right fit

Choosing the right platform wasn’t just about ticking feature boxes—it was about finding a solution that matched both their technical demands and their way of working. Here’s why CData stood out:

Built for engineers
From flexible scripting to deep metadata control, CData gave the data teams exactly what they needed: agility, transparency, and control over how data is accessed and used.
A true partner, not just a vendor
It wasn’t just the tech that made a difference. The CData sales and support teams took the time to listen, understand the use cases, and adapt to evolving needs—building trust at every step.
Seamless ecosystem integration
With native connectivity to Databricks, Microsoft Fabric, and a wide range of ERP systems, CData fit seamlessly into the company’s hybrid environment, avoiding unnecessary complexity and accelerating time to value. A key enabler was the use of the PostgreSQL wire protocol, which allowed Databricks to interact with virtualized datasets as a single Unity Catalog schema, rather than fragmenting each dataset into its own catalog. This simplified data access improved performance and aligned perfectly with governance practices already in place.

Overcoming organizational complexity

CData empowered teams to connect to new sources in minutes, not months—even to obscure or legacy systems. Engineers could control what data was presented and how, without waiting for central ETL cycles. The result: a distributed architecture that didn’t require moving everyone off their ERP systems but still made all data accessible.

The outcome: Real-time decision support, standardized access, and measurable business impact

The implementation of CData, integrated with Databricks and Microsoft technologies, provided a unified virtual data layer across the organization’s fragmented systems. This architecture enabled self-service data access, real-time insights delivery, and automation of critical reporting workflows, without disrupting existing infrastructure.

Key outcomes included:

Standardized access to all ERP and legacy systems
Operational reporting modernized, with production bonus metrics automatically calculated
AI safety systems now function on real-time inputs
Time-to-response reduced from weeks to minutes (e.g., OpenTrack connected in 5 minutes)
Enterprise reporting unified: Corporate analytics now supports C-level P&L and ROI evaluations across all divisions
Cross-divisional data sharing was unlocked, empowering teams across U.S. locations to discover and instantly use new datasets

“With data virtualization, we don’t just federate—we accelerate. From months to minutes. It’s now a matter of plugging in and executing.”

— Senior Data Architect

Looking ahead

What started as a way to unify access to disconnected ERP systems has grown into something much larger—a scalable, future-ready data architecture that’s now central to how the organization enables operational efficiency, AI-driven innovation, and strategic decision-making at the executive level.

And this is just the beginning.

The firm is now actively exploring ways to extend the value of its virtualized data layer even further, including:

Deeper integration with Databricks Unity Catalog streamflows for seamless real-time data consumption
Support for Oracle CDC to enhance change data capture and enable faster ingestion patterns
Expanding the virtual layer to support more divisions, more use cases, and more agility at scale

This isn’t just about technology adoption. It represents a foundational shift—from building pipelines to enabling a responsive, federated data fabric. Powered by data virtualization, not duplication, the organization is architecting for speed, adaptability, and long-term resilience.

CData: Real-time data unification without disruption

CData enables forward-thinking enterprises to standardize access across complex, distributed systems—without reengineering their entire infrastructure. By federating data from diverse ERPs into a unified semantic layer, it delivers real-time insights, accelerates operational reporting, and supports AI-driven innovation. With minimal disruption and maximum flexibility, CData transforms data sprawl into strategic clarity. Discover how you can power agile decision-making and unlock the value hidden across your enterprise data landscape.

Get started with CData today

Get a Free Trial

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Case Studies