What to See (and Who to Meet) at Databricks Data + AI Summit 2026

by Stanley Liu | June 4, 2026

Databricks Data + AI Summit 2026 Databricks picked "Build Apps and Agents That Work" as this year's theme, which is a candid acknowledgment that most agents today don't yet work reliably in production. Data access is where they break. Agents query exports that are hours old, reach for enterprise systems that never made it into the lakehouse, or run into source-side permissions nobody configured for agent use. If that problem sounds familiar, come find CData at Booth #126, Moscone Center in San Francisco, June 15–18.

Why AI and analytics projects stall at the data layer

Databricks is often the center of the modern data and AI architecture. But the enterprise data estate around it is rarely clean, complete, or standardized.

Most enterprise data estates include systems that predate the lakehouse era—on-premises databases, ERP platforms, SaaS applications, legacy APIs—that haven't been replicated into Databricks and may not be anytime soon. When an agent needs a live account record from Salesforce, a recent purchase order from SAP, or current inventory from an on-premises Oracle instance, there's no native path from Databricks to that data. The agent either can't answer, answers from a stale export, or waits for an engineer to build and maintain a custom connector.

The summit's own session catalog names this directly. "ADBC and the Future of Universal Data Connectivity for Agentic Systems" opens with the observation that agents inherit 1990s-era connectivity—slow, row-oriented APIs and brittle, system-specific connectors that weren't built for the fast, repeated tool calls agents make. Nationwide's session on modernizing their ServiceNow platform with Lakeflow Connect describes the downstream effect: legacy data extracted via APIs into siloed stores, governance gaps, slow refresh cycles, and data quality problems that slow every analytical and AI workload running above them.

What CData brings to Data + AI Summit 2026

Enterprise data teams at this summit are mostly working two problems at once: getting more data into the lakehouse so analytics and AI workloads have more to work with, and giving agents direct access to operational systems that aren't in the lakehouse and won't be anytime soon. CData makes products for both, and they're built on the same source catalog.

CData Connect AI is a managed Model Context Protocol (MCP) platform that sits between enterprise data sources and AI assistants, agents, automations, and agentic platforms. It gives AI systems live access to enterprise data through a governed connection layer rather than forcing teams to build custom tool integrations for every source. It normalizes access so AI agents can query live enterprise data through a single governed interface, with support for identity passthrough, OAuth 2.1, PKCE, SSO, least-privilege controls, audit logging, and cloud or on-premises sources.

CData Sync moves enterprise data into analytics platforms through automated ETL/ELT and CDC pipelines. For Databricks teams, that matters when source data needs to land in the lakehouse for analytics, reporting, model training, or downstream transformation. Sync supports Delta Lake and Iceberg, and its connection-based pricing model avoids row-volume pricing surprises.

CData Drivers give developers and data teams JDBC, ODBC, Python, MCP, and ADO.NET connectivity to hundreds of enterprise sources. They are useful when teams need direct connectivity inside pipeline development, BI tools, notebooks, embedded applications, or custom integration workflows.

Sync moves data into the lakehouse. Connect AI governs how agents reach live enterprise systems, and Drivers handle direct connectivity for developers and BI tools. Most Databricks architectures need all three.

What's on the agenda and where CData fits

Data + AI Summit 2026 runs June 15–18 at Moscone Center in San Francisco, with 800-plus sessions across tracks covering AI agents, data engineering, governance, analytics, and data warehousing. Tens of thousands of data engineers, architects, ML engineers, and data leaders from more than 160 countries are expected in person and virtually.

The keynote lineup is revealing. Databricks co-founders Ali Ghodsi and Matei Zaharia are joined by Greg Brockman from OpenAI, Harrison Chase from LangChain, Jerry Liu from LlamaIndex, and João Moura from crewAI, alongside enterprise practitioners from PepsiCo, Mercedes-Benz, and AstraZeneca. When a conference pairs the framework creators with the people running production systems at global enterprises, the conversation has moved past "should we build agents" into "why do ours keep breaking."

The session catalog spans Artificial Intelligence & Agents, Governance & Security, Data Engineering & Streaming, Application Development, Analytics & BI, Data Warehousing, and Cybersecurity. Several sessions point directly at the data access problem behind production agents and apps. “Make Me a Map”: Building a GIS Agent with Agent Bricks, MCP, and Lakebase shows an agent that takes a Slack message, queries Lakebase, uses an MCP server, connects to Felt, and returns a working map application. The session description calls out MCP tools for live connections and a pattern for agents that produce working applications, not just answers.

CData fits into those conversations because source access is where many production AI and analytics projects slow down. Databricks can provide the lakehouse, AI/BI, governance framework, and agent platform. CData helps teams connect the enterprise systems that sit outside the center of that architecture.

Why agents need live data, not exports

Enterprise AI agents make fast, repeated tool calls. "What is the current pipeline for this account?" and "What does the most recent purchase order show?" are only useful if the answers reflect the state of the source system right now.

The standard workaround is to export data into the lakehouse before agents can access it. The problem is that an export from last night is still last night's data. A model reasoning over yesterday's Salesforce export will confidently answer questions about accounts and contacts as they existed at export time.

Connect AI routes agent tool calls directly to source systems. The agent queries live data through MCP, the source system enforces access controls at retrieval time, and the response reflects the current record.

For data that needs to land in the lakehouse for model training, analytics, or AI/BI reporting, CData Sync handles the ingestion side with CDC pipelines that stay current as source systems change.

Getting enterprise data into the Databricks lakehouse

Many teams at this summit are consolidating data from legacy systems and SaaS applications into Databricks to support analytics and AI workloads. The practical constraint is the sources Lakeflow Connect doesn't natively cover: on-premises databases, specialized industry applications, high-volume transactional systems, and legacy APIs that require custom handling.

CData Sync covers those gaps with JDBC, ODBC, and API-based connectors for hundreds of enterprise sources, automated incremental and full-load pipelines, CDC support, and native Delta Lake and Iceberg output. Pricing by connection rather than row volume matters when CDC pipelines run against high-frequency transactional tables.

CData Drivers give engineering teams the JDBC, ODBC, and Python connector layer for custom pipeline development and Spark job integration. If you need to read from a ServiceNow table, a legacy SOAP-based ERP, or a less-common cloud source, Drivers provide the access layer without a custom connector build.

Who should meet CData at Databricks Data + AI Summit 2026

Data engineers and pipeline builders: If your backlog includes SaaS applications, on-premises databases, SAP, Oracle, DB2, or other sources outside your current Databricks ingestion path, CData Sync is the conversation to have. The team can show how automated replication, CDC, and open table format support fit into Databricks-centered data engineering work.

ML engineers and agent builders: Meet CData if you are building agents that need to reason over live business systems, not just staged or replicated data. CData Connect AI gives agents governed access to hundreds of enterprise sources through a managed MCP layer, with identity passthrough, audit logging, and source-aware connectivity.

Data architects and platform engineers: For the most part, Sync and Connect AI share the same source catalog. Sync handles replication; Connect AI handles live agent access. If you're designing a connectivity layer that serves both pipelines and agents, you can standardize on one vendor's source catalog instead of evaluating two.

IT and security leaders: Agent authorization is the question most security teams haven't gotten to yet. Connect AI handles it at the source using identity passthrough, OAuth 2.1, PKCE, and SSO, so what the agent can retrieve is governed by existing enterprise permissions. The model operates on what the authorized identity can see.

Frequently asked questions

What does CData do for Databricks teams?

CData helps Databricks teams connect to enterprise systems that sit outside the lakehouse. CData Sync supports automated ETL/ELT and CDC pipelines into analytics environments. CData Drivers provide direct connectivity through interfaces such as JDBC, ODBC, Python, MCP, and ADO.NET. CData Connect AI gives AI agents governed access to live enterprise systems through a managed MCP platform. Together, they help teams move data into Databricks, query operational systems directly, and govern AI access to enterprise data.

What is MCP, and why does it matter for enterprise AI?

Model Context Protocol (MCP) is a standard way for AI agents and tools to connect to external systems and data sources. It matters because agentic AI depends on more than prompts and model output. Agents need to retrieve context, call tools, and act against systems of record. Without a standard connection pattern, teams end up building one-off integrations between every agent and every enterprise system. CData Connect AI uses MCP as the access pattern for connecting AI assistants and agents to hundreds of enterprise systems through a managed platform, with identity, governance, and audit controls added for production enterprise use.

Who should meet CData at Data + AI Summit 2026?

Data engineers, AI/ML practitioners, analytics leaders, data architects, and IT teams whose Databricks or AI roadmap depends on enterprise data outside the lakehouse. CData will be at booth #126 during the June 15–18 event in San Francisco. The most relevant use cases: moving data from SaaS apps, ERPs, databases, and on-premises systems into Databricks; giving AI agents governed access to live enterprise sources; and using JDBC, ODBC, Python, MCP, or ADO.NET connectivity in developer and BI workflows.

How do I govern AI access to enterprise data?

Enforce permissions before the agent retrieves data, not only after the model generates an answer. CData Connect AI sits between AI tools and enterprise systems, honoring existing source permissions through identity passthrough and supporting OAuth 2.1, PKCE, SSO, least-privilege access, and audit logging. That lets AI agents query approved systems while preserving the access rules already defined in CRMs, ERPs, databases, and other enterprise sources.

How do I reduce hallucinations in enterprise AI agents?

Enterprise AI agent accuracy improves when agents retrieve current, consistent, and authorized data. Many agent failures come from stale exports, mismatched field definitions, incomplete retrieval scope, or access to data the user should not see. CData Connect AI addresses that at the connectivity layer by giving agents live access to enterprise systems, normalizing access across sources, and enforcing permissions at query time.

Come find us at Databricks Data + AI Summit 2026

CData will be at booth #126, June 15–18 at Moscone Center in San Francisco.

Bring the source systems slowing down your Databricks roadmap, the AI agent use case you're trying to govern, or the pipeline backlog your team is trying to reduce. The team can walk through how CData Connect AI, CData Sync, and CData Drivers fit your current architecture.

Your enterprise data, finally AI-ready.

Connect AI gives your AI assistants and agents live, governed access to hundreds of enterprise systems — so they can reason over your actual business data, not just what they were trained on.

Get the trial

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog