AWS at CData Foundations 2025: Laying the Data Foundation for Enterprise AI

by Arun Anand | October 16, 2025

AWS at CData Foundations 2025 One of the standout sessions at CData Foundations 2025 featured Harshit Kohli, Senior Technical Account Manager at AWS. In his session titled AI Needs Data, Kohli drove home a point few AI roadmaps emphasize strongly enough: AI models are advancing quickly, but most data infrastructure isn’t ready to support them.

As Harshit states in his talk, “AI is actually more ready than the data is... so organizations need to fast-track their data. They need to massage their data. They need to make sure that the data is meaningful so that the models can produce the best results they want.”

This theme resonates deeply with what we’re building at CData. For AWS customers across all levels of AI deployment maturity, the foundation of AI adoption isn’t just compute or the models themselves. Rather, meaningful enterprise AI adoption is dependent on the models having up-to-date, governed access to data across all systems.

Here we present the top three insights from Harshit’s talk, and dive deeper into how AWS and CData partner to provide our customers a solid infrastructure and data foundation for enterprise AI initiatives.

Harshit’s keys for enterprise AI adoption

1. The gap isn’t models – it’s data access and readiness

The aforementioned importance of live data is the perhaps the most important theme Harshit emphasized in his talk. This is because most organizations aren’t failing to achieve AI ROI due to lack of model sophistication - they’re failing because they lack access to governed, high-quality data. He described a common scenario where businesses, eager to deploy generative agents and copilots, quickly realize that their data is unstructured, scattered, and inaccessible in real time.

Harshit pointed to leading companies like Uber, Netflix, and Airbnb that deal with billions of daily events and are now investing heavily in data infrastructure to match the growing intelligence of their AI systems. He explained how smaller businesses are following suit, realizing that AI's performance is only as good as the data it can work with.

“We are already seeing these companies scaling their data infrastructure, and they have realized this concept that they need to fast-track their data.”

2. Infrastructure must support data velocity, variety, and volume

Harshit also explained that modern data environments must be able to handle not only large volumes of data but also various formats (structured, semi-structured, and unstructured—while enabling real-time accessibility. He warned that relying on traditional, static data warehouses could hinder AI development.

He noted that organizations are struggling with how to balance the pace of incoming data against the processing power of their models, creating bottlenecks in model performance and insight generation.

3. Success comes from strategic, staged implementation

Harshit praised organizations that avoid the temptation to rush AI deployment. He described a common best practice among successful AWS customers: starting with a clear long-term vision and then breaking it down into incremental, ROI-driven initiatives.

He stated, “they're not jumping the gun... they're dividing those missions or goals into very good fragmented amount of sub goals, and then they are allocating their expert teams to go and attain those sub goals.” This methodical approach enables companies to test, measure, and expand AI systems without overspending or overengineering. Kohli encouraged enterprises to focus first on building robust data infrastructure aligned to their use case goals.

How CData + AWS deliver the data foundation AI needs

CData's deep integration with AWS services allows our mutual customers to operationalize AI projects faster and more reliably by establishing a strong data foundation. Here are three ways CData integrates with AWS services to achieve this:

Accelerate AI readiness by simplifying data movement into Amazon S3 and Redshift

CData makes it easy to ingest high-volume, diverse datasets into Amazon S3 and Amazon Redshift, eliminating manual pipeline development and minimizing ETL overhead. With CData, users can easily create scalable, low-maintenance pipelines from any operational system directly into their AWS lakehouse architecture – ready for analytics, machine learning, or injection into LLMs as enterprise context.

This helps organizations automate the preparation of rich, AI-ready datasets for training and inference.

Enable real-time AI with live connectivity for AWS Bedrock models

For use cases requiring fresh, federated access to operational systems, CData Connect AI enables real-time connectivity into Bedrock-based AI models. Enterprises can connect live to data in Salesforce, SAP, ServiceNow, and more without replicating, ETL, or staging. This data is exposed to AWS-hosted AI copilots and agents via Model Context Protocol (MCP).

Live connectivity empowers Bedrock models to tap into current-state data (e.g., latest customer orders or support tickets), enhancing their value and reducing the risk of hallucination or drift.

Align with AWS-native services to build scalable, AI-ready architectures

CData plays natively with AWS tools:

Amazon S3 and Redshift: Load-ready connectors for fast, scalable sync
AWS Glue: Automatically expose external data to Glue jobs using SQL-based connectors
Amazon Athena: Query live data sources using standard SQL
Amazon SageMaker: Feed training data from disparate sources into ML pipelines without custom extractors

These integrations make it easier to build hybrid architectures that combine governed, historical datasets with real-time operational data — one of the core architectural shifts Kohli underscored.

Eliminate pipeline friction and control costs to scale AI initiatives efficiently

CData’s design philosophy centers on efficiency of data activation:

Push-down queries reduce cloud compute costs
Caching, incremental sync and live data access minimize data movement
Volume-independent pricing ensures cost predictability as workloads scale

This results in lower TCO, less time spent managing pipelines, and faster time to value.

The takeaway from Kohli’s examples and the experience of CData and AWS’ mutual customers is clear: AI thrives when the data is fresh, governed, and accessible. Together, CData and AWS enable that transformation.

Ready to see CData in action?

Start building faster, federated, AI-ready pipelines today.

Explore CData + AWS integration
Try our 14-day evaluation of CData Connect AI - securely connecting AI models to enterprise data sources

Explore CData Connect AI today

See how Connect AI excels at streamlining business processes for real-time insights.

Get the trial

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog