Developer Guide - Build AI Agents with LlamaIndex and CData Connect AI
Build intelligent data agents using LlamaIndex's ReAct agent framework with CData Connect AI to enable conversational access to your data. This guide walks you through creating a Python application that combines LlamaIndex's powerful agent capabilities with live data from 350+ sources.
NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.
By the end of this guide, you'll have a working Python application that can:
- Connect to any of 350+ data sources through CData Connect AI
- Use LlamaIndex's ReAct agent for intelligent tool orchestration
- Execute SQL queries using natural language
- Maintain multi-turn conversations with context
- Support both OpenAI and Anthropic LLM providers
- Stream responses in real-time
Why LlamaIndex?
LlamaIndex provides several key advantages for building AI agents:
- ReAct Agent Framework: Intelligent reasoning and action loop for complex multi-step tasks
- Multiple LLM Support: Works with OpenAI (GPT-4) and Anthropic (Claude) out of the box
- Streaming Support: Real-time token streaming for interactive applications
- Context Management: Built-in conversation context for multi-turn interactions
- Extensible Tools: Easy integration with external tools and data sources
- Production Ready: Battle-tested framework used in production applications
Architecture Overview
The application uses the Model Context Protocol (MCP) to bridge LlamaIndex agents with your data sources:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Your Python │---->│ CData Connect │---->│ Data Sources │
│ Application │ │ AI MCP Server │ │ (350+ types) │
│ │<----│ │<----│ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
| |
| Tool Discovery |
| & Execution |
v |
┌─────────────────┐ |
│ │ |
│ LlamaIndex │--------------┘
│ ReAct Agent │ Natural Language
│ (OpenAI/Claude)│ to SQL Translation
└─────────────────┘
How it works:
- Your Python application initializes the LlamaIndex agent with MCP tools
- The agent connects to CData Connect AI's MCP server over HTTP
- MCP tools are wrapped as LlamaIndex functions for the ReAct agent
- The ReAct agent reasons about which tools to use and executes them
- Results are returned and interpreted by the LLM
Prerequisites
This guide requires the following:
- Python 3.9+ installed on your system
- An OpenAI API key or Anthropic API key
- A CData Connect AI account (free trial available)
- A Google account for the sample Google Sheets data
Getting Started
Overview
Here's a quick overview of the steps:
- Set up sample data in Google Sheets
- Configure CData Connect AI and create a Personal Access Token
- Set up the Python project and install dependencies
- Build and run the agent chatbot
STEP 1: Set Up Sample Data in Google Sheets
We'll use a sample Google Sheet containing customer data to demonstrate the capabilities. This dataset includes accounts, sales opportunities, support tickets, and usage metrics.
- Navigate to the sample customer health spreadsheet
- Click File > Make a copy to save it to your Google Drive
- Give it a memorable name (e.g., "demo_organization") - you'll need this later
The spreadsheet contains four sheets:
- account: Company information (name, industry, revenue, employees)
- opportunity: Sales pipeline data (stage, amount, probability)
- tickets: Support tickets (priority, status, description)
- usage: Product usage metrics (job runs, records processed)
STEP 2: Configure CData Connect AI
2.1 Sign Up or Log In
- Navigate to https://www.cdata.com/ai/signup/ to create a new account, or https://cloud.cdata.com/ to log in
- Complete the registration process if creating a new account
2.2 Add a Google Sheets Connection
-
Once logged in, click Sources in the left navigation menu and click Add Connection
-
Select Google Sheets from the Add Connection panel
-
Configure the connection:
- Set the Spreadsheet property to the name of your copied sheet (e.g., "demo_organization")
- Click Sign in to authenticate with Google OAuth
-
After authentication, navigate to the Permissions tab and verify your user has access
2.3 Create a Personal Access Token
Your Python application will use a Personal Access Token (PAT) to authenticate with Connect AI.
- Click the Gear icon in the top right to open Settings
- Go to the Access Tokens section
- Click Create PAT
-
Give the token a name (e.g., "LlamaIndex Agent") and click Create
- Important: Copy the token immediately - it's only shown once!
STEP 3: Set Up the Python Project
3.1 Install via pip (Recommended)
Install the package directly from PyPI:
pip install connectai-llamaindex-agent
3.2 Alternative: Clone from GitHub
Clone the complete project with all examples:
git clone https://github.com/CDataSoftware/connectai-llamaindex-agent.git
cd connectai-llamaindex-agent
pip install -e .
3.3 Alternative: Create from Scratch
Create a new project directory and install dependencies:
mkdir connectai-llamaindex-app
cd connectai-llamaindex-app
pip install connectai-llamaindex-agent python-dotenv
3.4 Configure Environment Variables
Create a .env file in your project root:
# CData Connect AI Configuration (required)
[email protected]
CDATA_PAT=your-personal-access-token-here
# OpenAI Configuration (default)
OPENAI_API_KEY=sk-your-openai-api-key-here
# Or use Anthropic instead
# LLM_PROVIDER=anthropic
# ANTHROPIC_API_KEY=your-anthropic-api-key-here
Replace the placeholder values with your actual credentials.
STEP 4: Understanding the Code Architecture
The package consists of three main components:
4.1 Config Class
Handles configuration and credential management with support for multiple LLM providers:
from dataclasses import dataclass
import os
import base64
from typing import Optional
@dataclass
class Config:
"""Configuration for the Connect AI LlamaIndex Agent."""
cdata_email: str
cdata_pat: str
mcp_server_url: str = "https://mcp.cloud.cdata.com/mcp"
llm_provider: str = "openai"
openai_api_key: Optional[str] = None
openai_model: str = "gpt-4o"
anthropic_api_key: Optional[str] = None
anthropic_model: str = "claude-sonnet-4-20250514"
@classmethod
def from_env(cls) -> "Config":
"""Create configuration from environment variables."""
return cls(
cdata_email=os.getenv("CDATA_EMAIL"),
cdata_pat=os.getenv("CDATA_PAT"),
llm_provider=os.getenv("LLM_PROVIDER", "openai").lower(),
openai_api_key=os.getenv("OPENAI_API_KEY"),
openai_model=os.getenv("OPENAI_MODEL", "gpt-4o"),
anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"),
anthropic_model=os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514"),
)
def get_auth_header(self) -> str:
"""Generate Base64 authentication header."""
credentials = f"{self.cdata_email}:{self.cdata_pat}"
return base64.b64encode(credentials.encode()).decode()
4.2 MCPClient Class
Low-level client for direct HTTP communication with the CData Connect AI MCP server:
import json
import httpx
class MCPClient:
"""HTTP client for CData Connect AI MCP server."""
def __init__(self, config: Config):
self.config = config
self._client = httpx.Client(
headers={
"Authorization": f"Basic {config.get_auth_header()}",
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream",
"User-Agent": "connectai-llamaindex-agent/1.0.0",
},
timeout=60.0,
)
def _parse_sse_response(self, response: httpx.Response) -> dict:
"""Parse Server-Sent Events response."""
for line in response.text.split("
"):
if line.startswith("data: "):
return json.loads(line[6:])
return {}
def list_tools(self) -> list:
"""Discover available tools from the MCP server."""
response = self._client.post(
self.config.mcp_server_url,
json={"jsonrpc": "2.0", "method": "tools/list", "id": 1},
)
result = self._parse_sse_response(response)
return result.get("result", {}).get("tools", [])
def call_tool(self, name: str, arguments: dict = None) -> str:
"""Execute a tool on the MCP server."""
response = self._client.post(
self.config.mcp_server_url,
json={
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": name, "arguments": arguments or {}},
"id": 1,
},
)
result = self._parse_sse_response(response)
content = result.get("result", {}).get("content", [])
if content:
texts = [item.get("text", "") for item in content if item.get("type") == "text"]
return "
".join(texts)
return json.dumps(result)
4.3 MCPAgent Class
The LlamaIndex ReAct agent that combines LLM reasoning with MCP tools:
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context
class MCPAgent:
"""LlamaIndex-based agent for conversational data access."""
def __init__(self, config: Config, system_prompt: str = None, verbose: bool = False):
self.config = config
self._mcp_client = MCPClient(config)
# Initialize LLM based on provider
self._llm = self._create_llm()
# Create tool functions from MCP
self._tool_functions = self._create_tool_functions()
# Create the ReAct agent
self._agent = ReActAgent(
tools=self._tool_functions,
llm=self._llm,
system_prompt=system_prompt or DEFAULT_SYSTEM_PROMPT,
)
# Persistent context for multi-turn conversations
self._context = Context(self._agent)
def _create_llm(self):
"""Create the LLM instance based on configuration."""
if self.config.llm_provider == "anthropic":
from llama_index.llms.anthropic import Anthropic
return Anthropic(
api_key=self.config.anthropic_api_key,
model=self.config.anthropic_model,
)
else:
from llama_index.llms.openai import OpenAI
return OpenAI(
api_key=self.config.openai_api_key,
model=self.config.openai_model,
)
def chat(self, message: str) -> str:
"""Send a message and get a response with multi-turn context."""
import asyncio
return asyncio.run(self._achat(message))
async def _achat(self, message: str) -> str:
"""Async implementation of chat."""
handler = self._agent.run(message, ctx=self._context)
response = await handler
return str(response)
STEP 5: Build the Agent Chatbot
Create an interactive agent chatbot:
#!/usr/bin/env python3
"""Interactive chat application for querying data with AI."""
from dotenv import load_dotenv
from connectai_llamaindex import MCPAgent, Config
load_dotenv()
def main():
print("=" * 60)
print("CData Connect AI - LlamaIndex Chat Assistant")
print("=" * 60)
# Initialize configuration and agent
config = Config.from_env()
with MCPAgent(config, verbose=False) as agent:
print("
Connected! Available tools:", ", ".join(agent.get_available_tools()))
print("
Chat with your data! Type 'quit' to exit.
")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ("quit", "exit"):
print("Goodbye!")
break
if user_input.lower() == "clear":
agent.clear_history()
print("Conversation history cleared.
")
continue
response = agent.chat(user_input)
print(f"
Assistant: {response}
")
if __name__ == "__main__":
main()
STEP 6: Run Your Application
With everything configured, run your agent chatbot:
python basic_chat.py
You should see output like:
============================================================ CData Connect AI - LlamaIndex Chat Assistant ============================================================ Connected! Available tools: getCatalogs, getSchemas, getTables, getColumns, queryData, getProcedures, getProcedureParameters, executeProcedure, getInstructions Chat with your data! Type 'quit' to exit. You: What data sources do I have?
STEP 7: Example Queries
Here are some example prompts to try with your agent chatbot:
Data Discovery
- "What data sources do I have connected?"
- "Show me all the tables in demo_organization"
- "What columns are in the account table?"
Basic Queries
- "Query the top 5 accounts by annual_revenue"
- "How many support tickets are there by priority?"
- "Show me all open opportunities"
Multi-Turn Conversations
The ReAct agent maintains context across turns, enabling natural follow-up questions:
- You: "Show me the accounts table"
- You: "Which one has the highest revenue?"
- You: "Tell me more about that company"
Analysis
- "Which accounts have the most critical support tickets?"
- "Summarize the health of Aurora Healthcare Systems"
- "Find accounts with high revenue but low product usage"
STEP 8: Available MCP Tools
Your AI agent has access to these CData Connect AI tools:
| Tool | Description |
|---|---|
| getCatalogs | List available data source connections |
| getSchemas | Get schemas for a specific catalog |
| getTables | Get tables in a schema |
| getColumns | Get column metadata for a table |
| queryData | Execute SQL queries |
| getProcedures | List stored procedures |
| getProcedureParameters | Get procedure parameter details |
| executeProcedure | Execute stored procedures |
| getInstructions | Get driver-specific guidance for a data source |
STEP 9: Advanced Features
Streaming Responses
Get real-time token streaming for interactive applications:
with MCPAgent(config) as agent:
tokens = agent.stream_chat("Analyze my sales data")
for token in tokens:
print(token, end="", flush=True)
Low-Level MCP Client
Use the MCPClient directly for programmatic access without AI:
from connectai_llamaindex import MCPClient, Config
config = Config.from_env()
with MCPClient(config) as client:
# List available tools
tools = client.list_tools()
# Get catalogs
catalogs = client.get_catalogs()
print(catalogs)
# Execute a query
results = client.query_data(
"SELECT * FROM [demo_organization].[GoogleSheets].[account] LIMIT 10"
)
print(results)
Custom System Prompt
Customize the agent's behavior with a domain-specific system prompt:
custom_prompt = """You are a financial analyst assistant.
When analyzing data:
1. Always calculate key financial metrics
2. Identify trends and anomalies
3. Provide actionable insights
"""
agent = MCPAgent(config, system_prompt=custom_prompt)
Using Anthropic Claude
Switch to Anthropic's Claude model by updating your environment variables:
# In your .env file
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your-anthropic-api-key
ANTHROPIC_MODEL=claude-sonnet-4-20250514
STEP 10: SQL Query Format
When the AI generates SQL queries, it uses fully qualified table names:
SELECT [column1], [column2]
FROM [CatalogName].[SchemaName].[TableName]
WHERE [column1] = 'value'
ORDER BY [column2]
LIMIT 100
For example, to query the account table from your Google Sheets:
SELECT [name], [annual_revenue], [industry]
FROM [demo_organization].[GoogleSheets].[account]
ORDER BY [annual_revenue] DESC
LIMIT 10
Troubleshooting
Authentication Errors
- Verify your CData email and PAT are correct in .env
- Ensure the PAT has not expired
- Check that your Connect AI account is active
No Tools Available
- Confirm you have at least one data source connected in Connect AI
- Check that your user has permissions to access the connection
LLM Provider Errors
- Verify your OpenAI or Anthropic API key is set correctly
- Check that LLM_PROVIDER matches the API key you have configured
- Ensure the model name is valid for your chosen provider
Query Errors
- Use fully qualified table names: [Catalog].[Schema].[Table]
- Verify column names exist using the getColumns tool
- Check SQL syntax (Connect AI uses SQL-92 standard)
What's Next?
Now that you have a working LlamaIndex AI agent, you can:
- Connect more data sources: Add Salesforce, Snowflake, or any of 350+ supported sources to expand your data access.
- Customize the agent: Modify the system prompt for your specific use case and domain.
- Build production applications: Integrate the agent into web apps, Slack bots, or other interfaces.
- Add streaming: Use stream_chat() for real-time response streaming.
- Explore the examples: Check out the additional examples in the GitHub repository for data analysis workflows, multi-source queries, and more.
Resources
- GitHub Repository - Complete source code and examples
- CData Connect AI Documentation
- CData Prompt Library - Example prompts for various use cases
- LlamaIndex Documentation
- Model Context Protocol
Get Started with CData Connect AI
Ready to build AI-powered data applications? CData Connect AI provides live data access to 350+ SaaS, Big Data, and NoSQL sources directly from your AI applications.
Sign up for a free trial and start building intelligent data assistants today!