Pioneering a New Data Architecture for AI: CData Semantic Layer Meets Databricks Genie

by Sue Raiber | October 2, 2025

During the CData Foundations session “Pioneering a New Data Architecture for AI: CData Semantic Layer Meets Databricks Genie,” Andrew Chabot, Senior Manager of Data Engineering at FinThrive, and Eric Tome, Senior Solutions Architect at Databricks, delivered a powerful look at how CData and Databricks work together to enable organizations to drive their AI strategies forward.

The conversation centered on a simple but important truth: Databricks provides the engine for modern data engineering and AI, but CData ensures that this value can be made accessible, democratized, and operationalized across the enterprise faster.

Bridging the gap between business and technical users

Enterprises often face a persistent challenge: valuable data is locked in silos, governed in platforms like Databricks, but not easily consumable by business teams. While Databricks can help to process massive volumes of data at speed and power advanced analytics and AI, that data is still not readily available in a form business users can act on. Technical users can prepare it, but finance, operations, or marketing departments are often left waiting for extracts or custom pipelines.

As Andrew Chabot explained, this is where virtualization makes the difference. By creating a single logical endpoint, virtualization reduces ETL overhead and hydrates the Databricks lakehouse faster, without additional pipeline work. It also accelerates prototyping, allowing new data to be tested and iterated on quickly, without waiting for engineering cycles.

CData’s semantic layer, powered with virtualization, builds on this by exposing those governed datasets directly in the everyday tools business teams already use. The result: instant access, less dependence on IT bottlenecks, and a stronger connection between engineering and business needs.

Accelerating insight and experimentation

Eric Tome emphasized that pairing Databricks Genie with CData’s semantic layer isn’t just about access — it’s about reducing the time it takes for business users to benefit from Databricks’ speed and power.

Databricks can process massive volumes of data and drive advanced analytics and AI. But without a way to virtualize and expose that data broadly, the cycle from preparation to insight can still take weeks. CData eliminates that bottleneck by making governed Databricks datasets instantly available in the tools business teams already use.

For example:

A data scientist working in Databricks can publish a cleaned dataset. Within minutes, a business analyst can access the same dataset live in Tableau. No custom export or pipeline needed.
A finance team can model revenue scenarios in Excel, pulling directly from Databricks without manual refreshes.
A marketing analyst can combine Databricks-prepared data with SaaS sources like Salesforce, creating a cross-functional dataset that supports rapid experimentation with customer journeys or campaign attribution.

By bridging Databricks’ compute power with CData’s virtualization layer, organizations can run more experiments faster and turn successful pilots into production-ready outcomes without months of additional engineering.

Bringing it to life: A real-world demo

To ground the conversation in reality, the session featured a live demo of a real-world use case. Andrew and Eric walked through how data flows seamlessly when CData and Databricks are combined:

Data consolidation in Databricks
Multiple operational systems feed raw data into Databricks Genie, where it can be cleaned, governed, and transformed into trusted, AI-ready datasets.
CData Semantic Layer connects the dots
CData’s semantic layer powered with virtualization unifies disparate sources such as ERP systems, manufacturing databases, spreadsheets, or SaaS apps and presents them as standardized assets that Databricks can consume.
Instant access in familiar tools
Once published, these assets (for example, assets, locations, and sensors) appear in Databricks as if they were native tables. As Andrew noted, “Databricks doesn’t care what’s behind… it just knows assets, locations, and sensors.” That’s the power of CData: preparing the data in a way that Databricks can understand and act upon.
Smarter, faster outcomes
From there, curated datasets can be pushed into BI tools like Power BI, Tableau, or Excel in real time, enabling business users to analyze and act without needing to build custom pipelines. The demo showed how this architecture shortens the path from raw data to AI-driven insights, empowering teams to experiment quickly, automate workflows, and scale product-ready solutions.

Driving smarter automation

Chabot also highlighted how this architecture supports automation at scale. Enterprises often juggle dozens of operational systems, each producing valuable data in different formats.

Databricks Genie consolidates and governs that data. CData’s semantic layer then virtualizes it back into the tools and workflows business teams already rely on. The result is a closed feedback loop:

Engineers govern and optimize data in Databricks.
Business users act on that data in familiar tools via CData.
New insights feed back into Databricks for further refinement.

This creates smarter, repeatable processes across the business. Reporting cycles shorten, IT bottlenecks shrink, and teams gain the agility to move from experimentation to production without reinventing the wheel.

A complementary partnership

Both speakers underscored a critical point: CData does not compete with Databricks—it complements it. Databricks serves as the powerhouse for advanced data engineering and AI, while CData hydrates the lakehouse as quickly as possible by virtualizing disparate data from multiple systems, ensuring those insights reach the right people and applications across the enterprise without delay.

Andrew Chabot illustrated this perfectly during the session, as he demonstrated how Databricks consumes curated tables labeled assets, locations, and sensors:

Together, the two platforms complete the puzzle:

Databricks delivers scalable data engineering, ML, and AI.
CData delivers the semantic layer that virtualizes and integrates data sources, ensuring those insights can flow into BI platforms, SaaS tools, and operational workflows as quickly as possible.

Conclusion

This session highlighted how a modern data architecture for AI depends on both power and accessibility. Databricks delivers the power. CData delivers rapid accessibility. Together, they enable organizations to drive AI strategies forward, faster and more effectively.

As enterprises look ahead, this combination provides a scalable blueprint: harness Databricks for advanced analytics and AI and rely on CData’s semantic layer to ensure those insights flow seamlessly into every corner of the business.

Explore CData Virtuality

Take an interactive product tour to experience enhanced enterprise data management with powerful data virtualization and integration.

Tour the product

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog