Advanced Developer Guide - Build Customer Health Agents with LangGraph
Build intelligent customer health analysis agents that autonomously discover data schemas, query live enterprise data, and generate executive health briefs. This guide walks through creating a Python application that combines LangGraph multi-agent workflows with CData Connect AI to provide autonomous data discovery and analysis through an extensible 3-node agent pipeline.
NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.
By the end of this guide, you'll have a working Python application that can:
- Connect a LangGraph ReAct agent to 350+ enterprise data sources through CData Connect AI
- Build a 3-node agent pipeline with autonomous schema discovery, LLM-powered analysis, and HTML rendering
- Support multiple LLM providers (OpenAI, Anthropic, Google, Ollama) via a configurable factory
- Cache discovered schemas to speed up subsequent runs
- Generate HTML briefs with structured health scores, signals, recommendations, and risks
- Extend the agent pipeline with custom analysis nodes
Architecture Overview
The application uses the Model Context Protocol (MCP) to bridge LangGraph with your data sources:
┌────────────────────────────────────────────────┐
│ ReAct Gatherer Agent │
│ │
│ LLM decides next action │
│ | │
│ v │
│ MCP Tools (5 tools) CData Connect │
│ get_catalogs, get_schemas ──> AI MCP Server │
│ get_tables, get_columns (350+ sources) │
│ query_data │
│ | │
│ v loop until enough data gathered │
└───────────────────┬────────────────────────────┘
|
v
┌───────────────────────────────────────────────┐
│ Analyst Node (LLM) │
│ Structured JSON: health_score, signals, │
│ recommendations, risks, opportunities │
└───────────────────┬───────────────────────────┘
|
v
┌───────────────────────────────────────────────┐
│ Renderer Node (Deterministic) │
│ Jinja2 template -> HTML brief │
│ No LLM call, pure template rendering │
└───────────────────┬───────────────────────────┘
|
v
output/*.html
How it works:
- A ReAct agent autonomously discovers schemas via CData Connect AI's MCP tools and gathers data through iterative tool calls
- An Analyst node makes a single LLM call to produce a structured JSON health assessment (score, signals, recommendations, risks)
- A Renderer node fills a Jinja2 template with the analysis and saves a styled HTML brief (no LLM, deterministic)
- Each node can be upgraded to a full agent by adding tools -- the pipeline is multi-agent-ready
- Schema caching avoids redundant discovery on subsequent runs (24h TTL by default)
Prerequisites
This guide requires the following:
- Python 3.8+ installed on your system (Download Python)
- pip package installer (included with Python 3.4+). Verify with pip --version
- An OpenAI API key (requires a paid account), or any supported LLM provider (Anthropic, Google, Ollama)
- A CData Connect AI account (free trial here)
- A Google account for the sample Google Sheets data
Getting Started
Overview
Here's a quick overview of the steps:
- Set up sample data in Google Sheets
- Configure CData Connect AI and create a Personal Access Token
- Set up the Python project and install dependencies
- Understand the code architecture
- Run the agent
STEP 1: Set Up Sample Data in Google Sheets
We'll use a sample Google Sheet containing customer data to demonstrate the capabilities. This dataset includes accounts, sales opportunities, support tickets, and usage metrics.
- Navigate to the sample customer health spreadsheet
- Click File > Make a copy to save it to your Google Drive
- Give it a memorable name (e.g., "demo_organization") - you'll need this later
The spreadsheet contains four sheets:
- account: Company information (name, industry, revenue, employees)
- opportunity: Sales pipeline data (stage, amount, probability)
- tickets: Support tickets (priority, status, description)
- usage: Product usage metrics (job runs, records processed)
STEP 2: Configure CData Connect AI
2.1 Sign Up or Log In
- Navigate to https://www.cdata.com/ai/signup/ to create a new account, or https://cloud.cdata.com/ to log in
- Complete the registration process if creating a new account
2.2 Add a Google Sheets Connection
-
Once logged in, click Sources in the left navigation menu and click Add Connection
-
Select Google Sheets from the Add Connection panel
-
Configure the connection:
- Set the Spreadsheet property to the name of your copied sheet (e.g., "demo_organization")
- Click Sign in to authenticate with Google OAuth
-
After authentication, navigate to the Permissions tab and verify your user has access
2.3 Create a Personal Access Token
Your Python application will use a Personal Access Token (PAT) to authenticate with Connect AI.
- Click the Gear icon in the top right to open Settings
- Go to the Access Tokens section
- Click Create PAT
-
Give the token a name (e.g., "LangGraph Customer Health") and click Create
- Important: Copy the token immediately - it's only shown once!
STEP 3: Set Up the Python Project
3.1 Clone from GitHub (Recommended)
Clone the complete project with all source files:
git clone https://github.com/CDataSoftware/langgraph-customer-health-agent.git
cd langgraph-customer-health-agent
pip install -r requirements.txt
python run.py
The interactive runner (run.py) will guide you through credential setup and your first analysis run.
3.2 Alternative: Create from Scratch
Create a new project directory and install dependencies:
mkdir langgraph-customer-health-agent
cd langgraph-customer-health-agent
pip install langgraph langchain-core langchain-openai requests python-dotenv jinja2 rich
Then create the source files described in Steps 4 and 5.
3.3 Configure Environment Variables
Option A: Run python run.py and select the setup wizard (option 1) to configure credentials interactively.
Option B: Create a .env file manually in your project root:
cp .env.example .env
Open .env in a text editor and fill in the credentials:
# CData Connect AI
[email protected]
CDATA_PAT=your-personal-access-token
# LLM Provider (openai, anthropic, google, ollama)
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o
OPENAI_API_KEY=sk-proj-...
# Optional: force a specific catalog for demos
# CDATA_CATALOG=MCP_Apps_Demo
Replace the placeholder values with your actual credentials.
STEP 4: Understanding the Code Architecture
The project consists of a multi-agent pipeline with specialized modules:
4.1 Configuration & LLM Factory
The config.py module loads environment variables and provides a get_llm() factory that supports multiple LLM providers:
"""Configuration management for the customer health agent."""
import os
from dotenv import load_dotenv
load_dotenv(override=True)
# CData Connect AI
CDATA_EMAIL = os.getenv("CDATA_EMAIL")
CDATA_PAT = os.getenv("CDATA_PAT")
MCP_ENDPOINT = "https://mcp.cloud.cdata.com/mcp"
# Optional: force a specific catalog for demos
CDATA_CATALOG = os.getenv("CDATA_CATALOG")
# LLM configuration
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai")
LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4o")
def get_llm(temperature=0, model_override=None):
"""Factory function to create an LLM instance based on LLM_PROVIDER."""
provider = LLM_PROVIDER.lower()
model = model_override or LLM_MODEL
if provider == "openai":
from langchain_openai import ChatOpenAI
return ChatOpenAI(model=model, temperature=temperature)
elif provider == "anthropic":
from langchain_anthropic import ChatAnthropic
return ChatAnthropic(model=model, temperature=temperature)
elif provider == "google":
from langchain_google_genai import ChatGoogleGenerativeAI
return ChatGoogleGenerativeAI(model=model, temperature=temperature)
elif provider == "ollama":
from langchain_ollama import ChatOllama
return ChatOllama(model=model, temperature=temperature)
The get_llm() factory uses lazy imports so you only need the package for the provider you use. Set LLM_PROVIDER and LLM_MODEL in your .env file to switch between providers.
4.2 MCP Tools
Five @tool-decorated functions wrap CData Connect AI's MCP endpoint. A shared requests.Session handles authentication:
import base64
import json
import requests
from langchain_core.tools import tool
from config import CDATA_EMAIL, CDATA_PAT, MCP_ENDPOINT, CDATA_CATALOG
# Shared session with Basic Auth
_session = requests.Session()
_credentials = f"{CDATA_EMAIL}:{CDATA_PAT}"
_encoded = base64.b64encode(_credentials.encode()).decode()
_session.headers.update({
"Authorization": f"Basic {_encoded}",
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream",
})
def _call_mcp(method, params):
"""Send a JSON-RPC 2.0 request to the MCP endpoint."""
payload = {"jsonrpc": "2.0", "id": 1, "method": method, "params": params}
resp = _session.post(MCP_ENDPOINT, json=payload, timeout=60, stream=True)
# Parse SSE response (data: {...})
for line in resp.text.split("
"):
if line.startswith("data: "):
return json.loads(line[6:]).get("result", {})
@tool
def get_catalogs() -> str:
"""List all available data source connections (catalogs)."""
if CDATA_CATALOG:
return f"Available catalogs:
- {CDATA_CATALOG}"
result = _call_mcp("tools/call", {"name": "getCatalogs", "arguments": {}})
return _extract_text(result)
@tool
def get_tables(catalog_name: str, schema_name: str) -> str:
"""List tables in a catalog and schema."""
result = _call_mcp("tools/call", {
"name": "getTables",
"arguments": {"catalogName": catalog_name, "schemaName": schema_name}
})
return _extract_text(result)
@tool
def query_data(sql_query: str) -> str:
"""Execute a SQL SELECT query. Use [Catalog].[Schema].[Table] format."""
result = _call_mcp("tools/call", {"name": "queryData", "arguments": {"query": sql_query}})
return _extract_text(result)
The ReAct agent calls these tools autonomously. The CDATA_CATALOG environment variable lets you skip catalog discovery for demos by returning a single known catalog name.
4.3 Schema Cache
The schema_cache.py module caches discovered schema metadata to ~/.cache/langgraph-health/schema.json with a configurable TTL (default 24 hours):
import json, time
from pathlib import Path
from config import SCHEMA_CACHE_TTL
CACHE_FILE = Path.home() / ".cache" / "langgraph-health" / "schema.json"
def is_valid():
"""Check if cache exists and is within TTL."""
if not CACHE_FILE.exists():
return False
return (time.time() - CACHE_FILE.stat().st_mtime) < SCHEMA_CACHE_TTL
def load():
"""Load cached schema data."""
return json.loads(CACHE_FILE.read_text())
def save(schema_data):
"""Save schema to cache."""
CACHE_FILE.parent.mkdir(parents=True, exist_ok=True)
CACHE_FILE.write_text(json.dumps(schema_data, indent=2))
When the cache is valid, the gatherer agent injects the cached schema into its system prompt so it can skip discovery and start querying immediately. Use --refresh-schema to force re-discovery.
4.4 LangGraph Workflow
The agent uses a 3-node pipeline built with LangGraph's StateGraph:
gather (ReAct Agent) ──> analyze (LLM Node) ──> render (Deterministic Node)
| | |
Discover schema, Produce structured Fill HTML template,
query data via JSON assessment save styled brief
MCP tool calls (score, signals, to output/
recommendations)
from langgraph.graph import StateGraph
from state import AgentState
from agents.gatherer import gather_node
from agents.analyst import analyze_node
from agents.renderer import render_node
# Extensible pipeline -- add your agents to this list
PIPELINE = [
("gather", gather_node),
("analyze", analyze_node),
("render", render_node),
]
def build_graph():
"""Build and compile the LangGraph workflow."""
graph = StateGraph(AgentState)
for name, func in PIPELINE:
graph.add_node(name, func)
graph.set_entry_point(PIPELINE[0][0])
for i in range(len(PIPELINE) - 1):
graph.add_edge(PIPELINE[i][0], PIPELINE[i + 1][0])
return graph.compile()
The PIPELINE list makes it easy to add, remove, or reorder agents. Each node reads from and writes to the shared AgentState dictionary.
The gatherer node uses LangGraph's create_react_agent to create a ReAct loop:
from langgraph.prebuilt import create_react_agent
from config import get_llm, CDATA_CATALOG
from mcp_tools import get_catalogs, get_schemas, get_tables, get_columns, query_data
import schema_cache
TOOLS = [get_catalogs, get_schemas, get_tables, get_columns, query_data]
def gather_node(state):
"""ReAct data gatherer -- discovers schemas and queries data."""
sys_prompt = "You are a data gathering agent..."
# Inject cached schema if available
if schema_cache.is_valid():
sys_prompt += f"
Cached schema:
{json.dumps(schema_cache.load())}"
llm = get_llm()
agent = create_react_agent(llm, TOOLS, prompt=sys_prompt)
result = agent.invoke({"messages": [("user", state["user_prompt"])]})
return {"gathered_data": result["messages"][-1].content}
The ReAct agent decides which tools to call and in what order, adapting to whatever data source is connected.
4.5 Logger
The logger.py module provides a lightweight logger with custom formatting and run statistics. Use --verbose to see detailed output including MCP calls and timing:
[gatherer] 14:32:01 Schema cache hit [analyst] 14:32:05 Analyzing gathered data [renderer] 14:32:08 Brief saved to output/20260224_143208_premium_auto_health_brief.html [summary] 14:32:08 --- Run Summary --- [summary] 14:32:08 LLM calls: 4 [summary] 14:32:08 MCP calls: 12 [summary] 14:32:08 Total time: 7.23s
STEP 5: Run the Agent
5.1 Interactive Runner (Recommended First Time)
The easiest way to get started is the interactive runner. It handles credential setup, LLM provider selection, and running the agent through a menu-driven interface:
python run.py
The runner provides five options:
- Setup wizard — configure CData credentials and choose an LLM provider (OpenAI, Gemini, or DeepSeek) with model selection
- Run health analysis — analyze a specific account (with sample account suggestions)
- Run open-ended query — ask any question about your data (with sample query suggestions)
- Refresh schema cache — clear cached schemas for re-discovery
- Check setup — verify credentials, test MCP connection, and check dependencies
The rich library is auto-installed on first run if not already present.
5.2 Direct CLI: Account Health Analysis
Alternatively, run the agent directly from the command line:
python src/main.py --account "Premium Auto Group Europe"
Expected output:
- The ReAct agent discovers schemas, queries accounts, opportunities, and tickets
- The analyst node produces a health score with signals and recommendations
- An HTML brief is saved to output/TIMESTAMP_AccountName_health_brief.html
5.3 Direct CLI: Open-Ended Query
Ask any question in plain English:
python src/main.py "Show me the top 10 customers by revenue"
The ReAct agent figures out which tables and queries to run. You can ask complex questions that span multiple tables:
python src/main.py "Which industries have the most high-priority open tickets?" --verbose
5.4 Verbose Mode
Add --verbose to see detailed agent output including tool calls and timing:
python src/main.py --account "Premium Auto Group Europe" --verbose
Here is a sample health brief generated by the agent:
STEP 6: Query Examples
Here are some example queries to explore the data:
| Category | Query |
|---|---|
| Revenue | python src/main.py "Show me the top 10 customers by annual revenue" |
| Industry | python src/main.py "All customers in the energy sector" |
| Pipeline | python src/main.py "How many open opportunities do we have and total value" |
| Support | python src/main.py "Show all high priority open tickets" |
| Segmentation | python src/main.py "Customer count by industry" |
| Account Health | python src/main.py --account "Premium Auto Group Europe" |
STEP 7: Available MCP Tools
Your AI agent has access to these CData Connect AI tools:
| Tool | Description |
|---|---|
| getCatalogs | List available data source connections |
| getSchemas | Get schemas for a specific catalog |
| getTables | Get tables in a schema |
| getColumns | Get column metadata for a table |
| queryData | Execute SQL queries |
| getProcedures | List stored procedures |
| getProcedureParameters | Get procedure parameter details |
| executeProcedure | Execute stored procedures |
Troubleshooting
Query Returned No Results
- Verify the connection name in CData Connect AI is correct
- Check that the table and column names exist using the Connect AI data explorer
- Try a simpler query first: python src/main.py "Show me all customers"
- Use --verbose to see the SQL queries the agent generates
LLM API Errors
- Verify the OPENAI_API_KEY (or equivalent) is valid and has available credits
- The agent works best with GPT-4o or Claude Sonnet. Set LLM_MODEL in your .env
- For custom API endpoints, set OPENAI_API_BASE in your .env
Authentication Errors
- Verify your CData email and PAT are correct in .env
- Ensure the PAT has not expired
- Check that your Connect AI account is active
Agent Loops Too Many Times
- Set CDATA_CATALOG in .env to skip catalog discovery and narrow the agent's scope
- Reduce MAX_ITERATIONS (default: 15) to limit tool-call loops
- Use --verbose to see what the agent is doing at each step
Tool Calling Failures
- Ensure the CData Connect AI instance has at least one active data source connected
- Use fully qualified table names: [Catalog].[Schema].[Table]
- Verify column names exist using the Connect AI data explorer
What's Next?
Now that you have a working customer health agent, you can:
- Extend the agent pipeline: Add custom agents to the PIPELINE list in graph.py. For example, add a competitive analysis node, a churn prediction agent, or a financial modeling step between the analyst and renderer.
- Connect more data sources: Add Salesforce, HubSpot, Snowflake, or any of 350+ supported sources through the CData Connect AI dashboard. The ReAct agent discovers schemas automatically.
- Switch LLM providers: Change LLM_PROVIDER and LLM_MODEL in your .env to use Anthropic Claude, Google Gemini, or local models via Ollama.
- Add scheduling: Run health analysis automatically on a schedule for proactive customer monitoring.
- Add human-in-the-loop: LangGraph supports interrupt points where a human can review and approve the agent's actions before proceeding.
- Explore advanced patterns: The LangGraph documentation covers cycles, branches, parallel execution, and multi-agent collaboration.
Resources
- GitHub Repository - Complete source code
- LangGraph Documentation - Advanced workflow patterns and state management
- CData Connect AI Documentation - Connect more data sources and configure governed access
- CData Prompt Library - Example prompts for various use cases
- OpenAI API Documentation - OpenAI models and API reference
- Model Context Protocol - MCP specification and documentation
Get Started with CData Connect AI
Ready to build AI-powered data applications? CData Connect AI provides governed, secure access to 350+ enterprise data sources for AI applications. LangGraph agents can query live business data from Salesforce, Snowflake, HubSpot, Google Sheets, databases, and more through a single MCP interface.
Sign up for a free trial and start building intelligent customer health agents today!