Advanced Developer Guide - Build Customer Health Agents with LangGraph

Build intelligent customer health analysis agents that autonomously discover data schemas, query live enterprise data, and generate executive health briefs. This guide walks through creating a Python application that combines LangGraph multi-agent workflows with CData Connect AI to provide autonomous data discovery and analysis through an extensible 3-node agent pipeline.

NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.

By the end of this guide, you'll have a working Python application that can:

Connect a LangGraph ReAct agent to 350+ enterprise data sources through CData Connect AI
Build a 3-node agent pipeline with autonomous schema discovery, LLM-powered analysis, and HTML rendering
Support multiple LLM providers (OpenAI, Anthropic, Google, Ollama) via a configurable factory
Cache discovered schemas to speed up subsequent runs
Generate HTML briefs with structured health scores, signals, recommendations, and risks
Extend the agent pipeline with custom analysis nodes

Architecture Overview

The application uses the Model Context Protocol (MCP) to bridge LangGraph with your data sources:

┌────────────────────────────────────────────────┐
│              ReAct Gatherer Agent               │
│                                                │
│   LLM decides next action                      │
│      |                                         │
│      v                                         │
│   MCP Tools (5 tools)          CData Connect   │
│   get_catalogs, get_schemas ──> AI MCP Server  │
│   get_tables, get_columns      (350+ sources)  │
│   query_data                                   │
│      |                                         │
│      v  loop until enough data gathered        │
└───────────────────┬────────────────────────────┘
                    |
                    v
┌───────────────────────────────────────────────┐
│           Analyst Node (LLM)                   │
│   Structured JSON: health_score, signals,      │
│   recommendations, risks, opportunities        │
└───────────────────┬───────────────────────────┘
                    |
                    v
┌───────────────────────────────────────────────┐
│          Renderer Node (Deterministic)         │
│   Jinja2 template -> HTML brief               │
│   No LLM call, pure template rendering         │
└───────────────────┬───────────────────────────┘
                    |
                    v
              output/*.html

How it works:

A ReAct agent autonomously discovers schemas via CData Connect AI's MCP tools and gathers data through iterative tool calls
An Analyst node makes a single LLM call to produce a structured JSON health assessment (score, signals, recommendations, risks)
A Renderer node fills a Jinja2 template with the analysis and saves a styled HTML brief (no LLM, deterministic)
Each node can be upgraded to a full agent by adding tools -- the pipeline is multi-agent-ready
Schema caching avoids redundant discovery on subsequent runs (24h TTL by default)

Prerequisites

This guide requires the following:

Python 3.8+ installed on your system (Download Python)
pip package installer (included with Python 3.4+). Verify with pip --version
An OpenAI API key (requires a paid account), or any supported LLM provider (Anthropic, Google, Ollama)
A CData Connect AI account (free trial here)
A Google account for the sample Google Sheets data

Getting Started

Overview

Here's a quick overview of the steps:

Set up sample data in Google Sheets
Configure CData Connect AI and create a Personal Access Token
Set up the Python project and install dependencies
Understand the code architecture
Run the agent

STEP 1: Set Up Sample Data in Google Sheets

We'll use a sample Google Sheet containing customer data to demonstrate the capabilities. This dataset includes accounts, sales opportunities, support tickets, and usage metrics.

Navigate to the sample customer health spreadsheet
Click File > Make a copy to save it to your Google Drive
Give it a memorable name (e.g., "demo_organization") - you'll need this later

The spreadsheet contains four sheets:

account: Company information (name, industry, revenue, employees)
opportunity: Sales pipeline data (stage, amount, probability)
tickets: Support tickets (priority, status, description)
usage: Product usage metrics (job runs, records processed)

STEP 2: Configure CData Connect AI

2.1 Sign Up or Log In

Navigate to https://www.cdata.com/ai/signup/ to create a new account, or https://cloud.cdata.com/ to log in
Complete the registration process if creating a new account

2.2 Add a Google Sheets Connection

Once logged in, click Sources in the left navigation menu and click Add Connection
Select Google Sheets from the Add Connection panel
Configure the connection:
- Set the Spreadsheet property to the name of your copied sheet (e.g., "demo_organization")
- Click Sign in to authenticate with Google OAuth
After authentication, navigate to the Permissions tab and verify your user has access

2.3 Create a Personal Access Token

Your Python application will use a Personal Access Token (PAT) to authenticate with Connect AI.

Click the Gear icon in the top right to open Settings
Go to the Access Tokens section
Click Create PAT
Give the token a name (e.g., "LangGraph Customer Health") and click Create
Important: Copy the token immediately - it's only shown once!

STEP 3: Set Up the Python Project

3.1 Clone from GitHub (Recommended)

Clone the complete project with all source files:


git clone https://github.com/CDataSoftware/langgraph-customer-health-agent.git
cd langgraph-customer-health-agent
pip install -r requirements.txt
python run.py

The interactive runner (run.py) will guide you through credential setup and your first analysis run.

3.2 Alternative: Create from Scratch

Create a new project directory and install dependencies:


mkdir langgraph-customer-health-agent
cd langgraph-customer-health-agent
pip install langgraph langchain-core langchain-openai requests python-dotenv jinja2 rich

Then create the source files described in Steps 4 and 5.

3.3 Configure Environment Variables

Option A: Run python run.py and select the setup wizard (option 1) to configure credentials interactively.

Option B: Create a .env file manually in your project root:


cp .env.example .env

Open .env in a text editor and fill in the credentials:


# CData Connect AI
[email protected]
CDATA_PAT=your-personal-access-token

# LLM Provider (openai, anthropic, google, ollama)
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o
OPENAI_API_KEY=sk-proj-...

# Optional: force a specific catalog for demos
# CDATA_CATALOG=MCP_Apps_Demo

Replace the placeholder values with your actual credentials.

STEP 4: Understanding the Code Architecture

The project consists of a multi-agent pipeline with specialized modules:

4.1 Configuration & LLM Factory

The config.py module loads environment variables and provides a get_llm() factory that supports multiple LLM providers:


"""Configuration management for the customer health agent."""
import os
from dotenv import load_dotenv

load_dotenv(override=True)

# CData Connect AI
CDATA_EMAIL = os.getenv("CDATA_EMAIL")
CDATA_PAT = os.getenv("CDATA_PAT")
MCP_ENDPOINT = "https://mcp.cloud.cdata.com/mcp"

# Optional: force a specific catalog for demos
CDATA_CATALOG = os.getenv("CDATA_CATALOG")

# LLM configuration
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai")
LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4o")

def get_llm(temperature=0, model_override=None):
    """Factory function to create an LLM instance based on LLM_PROVIDER."""
    provider = LLM_PROVIDER.lower()
    model = model_override or LLM_MODEL

    if provider == "openai":
        from langchain_openai import ChatOpenAI
        return ChatOpenAI(model=model, temperature=temperature)
    elif provider == "anthropic":
        from langchain_anthropic import ChatAnthropic
        return ChatAnthropic(model=model, temperature=temperature)
    elif provider == "google":
        from langchain_google_genai import ChatGoogleGenerativeAI
        return ChatGoogleGenerativeAI(model=model, temperature=temperature)
    elif provider == "ollama":
        from langchain_ollama import ChatOllama
        return ChatOllama(model=model, temperature=temperature)

The get_llm() factory uses lazy imports so you only need the package for the provider you use. Set LLM_PROVIDER and LLM_MODEL in your .env file to switch between providers.

4.2 MCP Tools

Five @tool-decorated functions wrap CData Connect AI's MCP endpoint. A shared requests.Session handles authentication:


import base64
import json
import requests
from langchain_core.tools import tool
from config import CDATA_EMAIL, CDATA_PAT, MCP_ENDPOINT, CDATA_CATALOG

# Shared session with Basic Auth
_session = requests.Session()
_credentials = f"{CDATA_EMAIL}:{CDATA_PAT}"
_encoded = base64.b64encode(_credentials.encode()).decode()
_session.headers.update({
    "Authorization": f"Basic {_encoded}",
    "Content-Type": "application/json",
    "Accept": "application/json, text/event-stream",
})

def _call_mcp(method, params):
    """Send a JSON-RPC 2.0 request to the MCP endpoint."""
    payload = {"jsonrpc": "2.0", "id": 1, "method": method, "params": params}
    resp = _session.post(MCP_ENDPOINT, json=payload, timeout=60, stream=True)
    # Parse SSE response (data: {...})
    for line in resp.text.split("
"):
        if line.startswith("data: "):
            return json.loads(line[6:]).get("result", {})

@tool
def get_catalogs() -> str:
    """List all available data source connections (catalogs)."""
    if CDATA_CATALOG:
        return f"Available catalogs:
- {CDATA_CATALOG}"
    result = _call_mcp("tools/call", {"name": "getCatalogs", "arguments": {}})
    return _extract_text(result)

@tool
def get_tables(catalog_name: str, schema_name: str) -> str:
    """List tables in a catalog and schema."""
    result = _call_mcp("tools/call", {
        "name": "getTables",
        "arguments": {"catalogName": catalog_name, "schemaName": schema_name}
    })
    return _extract_text(result)

@tool
def query_data(sql_query: str) -> str:
    """Execute a SQL SELECT query. Use [Catalog].[Schema].[Table] format."""
    result = _call_mcp("tools/call", {"name": "queryData", "arguments": {"query": sql_query}})
    return _extract_text(result)

The ReAct agent calls these tools autonomously. The CDATA_CATALOG environment variable lets you skip catalog discovery for demos by returning a single known catalog name.

4.3 Schema Cache

The schema_cache.py module caches discovered schema metadata to ~/.cache/langgraph-health/schema.json with a configurable TTL (default 24 hours):


import json, time
from pathlib import Path
from config import SCHEMA_CACHE_TTL

CACHE_FILE = Path.home() / ".cache" / "langgraph-health" / "schema.json"

def is_valid():
    """Check if cache exists and is within TTL."""
    if not CACHE_FILE.exists():
        return False
    return (time.time() - CACHE_FILE.stat().st_mtime) < SCHEMA_CACHE_TTL

def load():
    """Load cached schema data."""
    return json.loads(CACHE_FILE.read_text())

def save(schema_data):
    """Save schema to cache."""
    CACHE_FILE.parent.mkdir(parents=True, exist_ok=True)
    CACHE_FILE.write_text(json.dumps(schema_data, indent=2))

When the cache is valid, the gatherer agent injects the cached schema into its system prompt so it can skip discovery and start querying immediately. Use --refresh-schema to force re-discovery.

4.4 LangGraph Workflow

The agent uses a 3-node pipeline built with LangGraph's StateGraph:

  gather (ReAct Agent) ──> analyze (LLM Node) ──> render (Deterministic Node)
       |                   |                   |
  Discover schema,   Produce structured   Fill HTML template,
  query data via     JSON assessment      save styled brief
  MCP tool calls     (score, signals,     to output/
                      recommendations)


from langgraph.graph import StateGraph
from state import AgentState
from agents.gatherer import gather_node
from agents.analyst import analyze_node
from agents.renderer import render_node

# Extensible pipeline -- add your agents to this list
PIPELINE = [
    ("gather", gather_node),
    ("analyze", analyze_node),
    ("render", render_node),
]

def build_graph():
    """Build and compile the LangGraph workflow."""
    graph = StateGraph(AgentState)
    for name, func in PIPELINE:
        graph.add_node(name, func)
    graph.set_entry_point(PIPELINE[0][0])
    for i in range(len(PIPELINE) - 1):
        graph.add_edge(PIPELINE[i][0], PIPELINE[i + 1][0])
    return graph.compile()

The PIPELINE list makes it easy to add, remove, or reorder agents. Each node reads from and writes to the shared AgentState dictionary.

The gatherer node uses LangGraph's create_react_agent to create a ReAct loop:


from langgraph.prebuilt import create_react_agent
from config import get_llm, CDATA_CATALOG
from mcp_tools import get_catalogs, get_schemas, get_tables, get_columns, query_data
import schema_cache

TOOLS = [get_catalogs, get_schemas, get_tables, get_columns, query_data]

def gather_node(state):
    """ReAct data gatherer -- discovers schemas and queries data."""
    sys_prompt = "You are a data gathering agent..."

    # Inject cached schema if available
    if schema_cache.is_valid():
        sys_prompt += f"
Cached schema:
{json.dumps(schema_cache.load())}"

    llm = get_llm()
    agent = create_react_agent(llm, TOOLS, prompt=sys_prompt)
    result = agent.invoke({"messages": [("user", state["user_prompt"])]})
    return {"gathered_data": result["messages"][-1].content}

The ReAct agent decides which tools to call and in what order, adapting to whatever data source is connected.

4.5 Logger

The logger.py module provides a lightweight logger with custom formatting and run statistics. Use --verbose to see detailed output including MCP calls and timing:

[gatherer] 14:32:01 Schema cache hit
[analyst] 14:32:05 Analyzing gathered data
[renderer] 14:32:08 Brief saved to output/20260224_143208_premium_auto_health_brief.html
[summary] 14:32:08 --- Run Summary ---
[summary] 14:32:08 LLM calls: 4
[summary] 14:32:08 MCP calls: 12
[summary] 14:32:08 Total time: 7.23s

STEP 5: Run the Agent

5.1 Interactive Runner (Recommended First Time)

The easiest way to get started is the interactive runner. It handles credential setup, LLM provider selection, and running the agent through a menu-driven interface:


python run.py

The runner provides five options:

Setup wizard — configure CData credentials and choose an LLM provider (OpenAI, Gemini, or DeepSeek) with model selection
Run health analysis — analyze a specific account (with sample account suggestions)
Run open-ended query — ask any question about your data (with sample query suggestions)
Refresh schema cache — clear cached schemas for re-discovery
Check setup — verify credentials, test MCP connection, and check dependencies

The rich library is auto-installed on first run if not already present.

5.2 Direct CLI: Account Health Analysis

Alternatively, run the agent directly from the command line:


python src/main.py --account "Premium Auto Group Europe"

Expected output:

The ReAct agent discovers schemas, queries accounts, opportunities, and tickets
The analyst node produces a health score with signals and recommendations
An HTML brief is saved to output/TIMESTAMP_AccountName_health_brief.html

5.3 Direct CLI: Open-Ended Query

Ask any question in plain English:


python src/main.py "Show me the top 10 customers by revenue"

The ReAct agent figures out which tables and queries to run. You can ask complex questions that span multiple tables:


python src/main.py "Which industries have the most high-priority open tickets?" --verbose

5.4 Verbose Mode

Add --verbose to see detailed agent output including tool calls and timing:


python src/main.py --account "Premium Auto Group Europe" --verbose

Here is a sample health brief generated by the agent:

STEP 6: Query Examples

Here are some example queries to explore the data:

Category	Query
Revenue	`python src/main.py "Show me the top 10 customers by annual revenue"`
Industry	`python src/main.py "All customers in the energy sector"`
Pipeline	`python src/main.py "How many open opportunities do we have and total value"`
Support	`python src/main.py "Show all high priority open tickets"`
Segmentation	`python src/main.py "Customer count by industry"`
Account Health	`python src/main.py --account "Premium Auto Group Europe"`

STEP 7: Available MCP Tools

Your AI agent has access to these CData Connect AI tools:

Tool	Description
`getCatalogs`	List available data source connections
`getSchemas`	Get schemas for a specific catalog
`getTables`	Get tables in a schema
`getColumns`	Get column metadata for a table
`queryData`	Execute SQL queries
`getProcedures`	List stored procedures
`getProcedureParameters`	Get procedure parameter details
`executeProcedure`	Execute stored procedures

Troubleshooting

Query Returned No Results

Verify the connection name in CData Connect AI is correct
Check that the table and column names exist using the Connect AI data explorer
Try a simpler query first: python src/main.py "Show me all customers"
Use --verbose to see the SQL queries the agent generates

LLM API Errors

Verify the OPENAI_API_KEY (or equivalent) is valid and has available credits
The agent works best with GPT-4o or Claude Sonnet. Set LLM_MODEL in your .env
For custom API endpoints, set OPENAI_API_BASE in your .env

Authentication Errors

Verify your CData email and PAT are correct in .env
Ensure the PAT has not expired
Check that your Connect AI account is active

Agent Loops Too Many Times

Set CDATA_CATALOG in .env to skip catalog discovery and narrow the agent's scope
Reduce MAX_ITERATIONS (default: 15) to limit tool-call loops
Use --verbose to see what the agent is doing at each step

Tool Calling Failures

Ensure the CData Connect AI instance has at least one active data source connected
Use fully qualified table names: [Catalog].[Schema].[Table]
Verify column names exist using the Connect AI data explorer

What's Next?

Now that you have a working customer health agent, you can:

Extend the agent pipeline: Add custom agents to the PIPELINE list in graph.py. For example, add a competitive analysis node, a churn prediction agent, or a financial modeling step between the analyst and renderer.
Connect more data sources: Add Salesforce, HubSpot, Snowflake, or any of 350+ supported sources through the CData Connect AI dashboard. The ReAct agent discovers schemas automatically.
Switch LLM providers: Change LLM_PROVIDER and LLM_MODEL in your .env to use Anthropic Claude, Google Gemini, or local models via Ollama.
Add scheduling: Run health analysis automatically on a schedule for proactive customer monitoring.
Add human-in-the-loop: LangGraph supports interrupt points where a human can review and approve the agent's actions before proceeding.
Explore advanced patterns: The LangGraph documentation covers cycles, branches, parallel execution, and multi-agent collaboration.

Resources

GitHub Repository - Complete source code
LangGraph Documentation - Advanced workflow patterns and state management
CData Connect AI Documentation - Connect more data sources and configure governed access
CData Prompt Library - Example prompts for various use cases
OpenAI API Documentation - OpenAI models and API reference
Model Context Protocol - MCP specification and documentation

Get Started with CData Connect AI

Ready to build AI-powered data applications? CData Connect AI provides governed, secure access to 350+ enterprise data sources for AI applications. LangGraph agents can query live business data from Salesforce, Snowflake, HubSpot, Google Sheets, databases, and more through a single MCP interface.

Feedback