Advanced Developer Guide - Build Customer Health Agents with LangGraph

Build intelligent customer health analysis agents that autonomously discover data schemas, query live enterprise data, and generate executive health briefs. This guide walks through creating a Python application that combines LangGraph multi-agent workflows with CData Connect AI to provide autonomous data discovery and analysis through an extensible 3-node agent pipeline.

NOTE: While this guide uses Google Sheets as the data source, the same principles apply to any of the 350+ data sources CData Connect AI supports.

By the end of this guide, you'll have a working Python application that can:

  • Connect a LangGraph ReAct agent to 350+ enterprise data sources through CData Connect AI
  • Build a 3-node agent pipeline with autonomous schema discovery, LLM-powered analysis, and HTML rendering
  • Support multiple LLM providers (OpenAI, Anthropic, Google, Ollama) via a configurable factory
  • Cache discovered schemas to speed up subsequent runs
  • Generate HTML briefs with structured health scores, signals, recommendations, and risks
  • Extend the agent pipeline with custom analysis nodes

Architecture Overview

The application uses the Model Context Protocol (MCP) to bridge LangGraph with your data sources:

┌────────────────────────────────────────────────┐
│              ReAct Gatherer Agent               │
│                                                │
│   LLM decides next action                      │
│      |                                         │
│      v                                         │
│   MCP Tools (5 tools)          CData Connect   │
│   get_catalogs, get_schemas ──> AI MCP Server  │
│   get_tables, get_columns      (350+ sources)  │
│   query_data                                   │
│      |                                         │
│      v  loop until enough data gathered        │
└───────────────────┬────────────────────────────┘
                    |
                    v
┌───────────────────────────────────────────────┐
│           Analyst Node (LLM)                   │
│   Structured JSON: health_score, signals,      │
│   recommendations, risks, opportunities        │
└───────────────────┬───────────────────────────┘
                    |
                    v
┌───────────────────────────────────────────────┐
│          Renderer Node (Deterministic)         │
│   Jinja2 template -> HTML brief               │
│   No LLM call, pure template rendering         │
└───────────────────┬───────────────────────────┘
                    |
                    v
              output/*.html

How it works:

  1. A ReAct agent autonomously discovers schemas via CData Connect AI's MCP tools and gathers data through iterative tool calls
  2. An Analyst node makes a single LLM call to produce a structured JSON health assessment (score, signals, recommendations, risks)
  3. A Renderer node fills a Jinja2 template with the analysis and saves a styled HTML brief (no LLM, deterministic)
  4. Each node can be upgraded to a full agent by adding tools -- the pipeline is multi-agent-ready
  5. Schema caching avoids redundant discovery on subsequent runs (24h TTL by default)

Prerequisites

This guide requires the following:

  • Python 3.8+ installed on your system (Download Python)
  • pip package installer (included with Python 3.4+). Verify with pip --version
  • An OpenAI API key (requires a paid account), or any supported LLM provider (Anthropic, Google, Ollama)
  • A CData Connect AI account (free trial here)
  • A Google account for the sample Google Sheets data


Getting Started

Overview

Here's a quick overview of the steps:

  1. Set up sample data in Google Sheets
  2. Configure CData Connect AI and create a Personal Access Token
  3. Set up the Python project and install dependencies
  4. Understand the code architecture
  5. Run the agent

STEP 1: Set Up Sample Data in Google Sheets

We'll use a sample Google Sheet containing customer data to demonstrate the capabilities. This dataset includes accounts, sales opportunities, support tickets, and usage metrics.

  1. Navigate to the sample customer health spreadsheet
  2. Click File > Make a copy to save it to your Google Drive
  3. Give it a memorable name (e.g., "demo_organization") - you'll need this later

The spreadsheet contains four sheets:

  • account: Company information (name, industry, revenue, employees)
  • opportunity: Sales pipeline data (stage, amount, probability)
  • tickets: Support tickets (priority, status, description)
  • usage: Product usage metrics (job runs, records processed)

STEP 2: Configure CData Connect AI

2.1 Sign Up or Log In

  1. Navigate to https://www.cdata.com/ai/signup/ to create a new account, or https://cloud.cdata.com/ to log in
  2. Complete the registration process if creating a new account

2.2 Add a Google Sheets Connection

  1. Once logged in, click Sources in the left navigation menu and click Add Connection
  2. Select Google Sheets from the Add Connection panel
  3. Configure the connection:
    • Set the Spreadsheet property to the name of your copied sheet (e.g., "demo_organization")
    • Click Sign in to authenticate with Google OAuth
  4. After authentication, navigate to the Permissions tab and verify your user has access

2.3 Create a Personal Access Token

Your Python application will use a Personal Access Token (PAT) to authenticate with Connect AI.

  1. Click the Gear icon in the top right to open Settings
  2. Go to the Access Tokens section
  3. Click Create PAT
  4. Give the token a name (e.g., "LangGraph Customer Health") and click Create
  5. Important: Copy the token immediately - it's only shown once!

STEP 3: Set Up the Python Project

3.1 Clone from GitHub (Recommended)

Clone the complete project with all source files:

git clone https://github.com/CDataSoftware/langgraph-customer-health-agent.git cd langgraph-customer-health-agent pip install -r requirements.txt python run.py

The interactive runner (run.py) will guide you through credential setup and your first analysis run.

3.2 Alternative: Create from Scratch

Create a new project directory and install dependencies:

mkdir langgraph-customer-health-agent cd langgraph-customer-health-agent pip install langgraph langchain-core langchain-openai requests python-dotenv jinja2 rich

Then create the source files described in Steps 4 and 5.

3.3 Configure Environment Variables

Option A: Run python run.py and select the setup wizard (option 1) to configure credentials interactively.

Option B: Create a .env file manually in your project root:

cp .env.example .env

Open .env in a text editor and fill in the credentials:

# CData Connect AI [email protected] CDATA_PAT=your-personal-access-token # LLM Provider (openai, anthropic, google, ollama) LLM_PROVIDER=openai LLM_MODEL=gpt-4o OPENAI_API_KEY=sk-proj-... # Optional: force a specific catalog for demos # CDATA_CATALOG=MCP_Apps_Demo

Replace the placeholder values with your actual credentials.


STEP 4: Understanding the Code Architecture

The project consists of a multi-agent pipeline with specialized modules:

4.1 Configuration & LLM Factory

The config.py module loads environment variables and provides a get_llm() factory that supports multiple LLM providers:

"""Configuration management for the customer health agent.""" import os from dotenv import load_dotenv load_dotenv(override=True) # CData Connect AI CDATA_EMAIL = os.getenv("CDATA_EMAIL") CDATA_PAT = os.getenv("CDATA_PAT") MCP_ENDPOINT = "https://mcp.cloud.cdata.com/mcp" # Optional: force a specific catalog for demos CDATA_CATALOG = os.getenv("CDATA_CATALOG") # LLM configuration LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai") LLM_MODEL = os.getenv("LLM_MODEL", "gpt-4o") def get_llm(temperature=0, model_override=None): """Factory function to create an LLM instance based on LLM_PROVIDER.""" provider = LLM_PROVIDER.lower() model = model_override or LLM_MODEL if provider == "openai": from langchain_openai import ChatOpenAI return ChatOpenAI(model=model, temperature=temperature) elif provider == "anthropic": from langchain_anthropic import ChatAnthropic return ChatAnthropic(model=model, temperature=temperature) elif provider == "google": from langchain_google_genai import ChatGoogleGenerativeAI return ChatGoogleGenerativeAI(model=model, temperature=temperature) elif provider == "ollama": from langchain_ollama import ChatOllama return ChatOllama(model=model, temperature=temperature)

The get_llm() factory uses lazy imports so you only need the package for the provider you use. Set LLM_PROVIDER and LLM_MODEL in your .env file to switch between providers.

4.2 MCP Tools

Five @tool-decorated functions wrap CData Connect AI's MCP endpoint. A shared requests.Session handles authentication:

import base64 import json import requests from langchain_core.tools import tool from config import CDATA_EMAIL, CDATA_PAT, MCP_ENDPOINT, CDATA_CATALOG # Shared session with Basic Auth _session = requests.Session() _credentials = f"{CDATA_EMAIL}:{CDATA_PAT}" _encoded = base64.b64encode(_credentials.encode()).decode() _session.headers.update({ "Authorization": f"Basic {_encoded}", "Content-Type": "application/json", "Accept": "application/json, text/event-stream", }) def _call_mcp(method, params): """Send a JSON-RPC 2.0 request to the MCP endpoint.""" payload = {"jsonrpc": "2.0", "id": 1, "method": method, "params": params} resp = _session.post(MCP_ENDPOINT, json=payload, timeout=60, stream=True) # Parse SSE response (data: {...}) for line in resp.text.split(" "): if line.startswith("data: "): return json.loads(line[6:]).get("result", {}) @tool def get_catalogs() -> str: """List all available data source connections (catalogs).""" if CDATA_CATALOG: return f"Available catalogs: - {CDATA_CATALOG}" result = _call_mcp("tools/call", {"name": "getCatalogs", "arguments": {}}) return _extract_text(result) @tool def get_tables(catalog_name: str, schema_name: str) -> str: """List tables in a catalog and schema.""" result = _call_mcp("tools/call", { "name": "getTables", "arguments": {"catalogName": catalog_name, "schemaName": schema_name} }) return _extract_text(result) @tool def query_data(sql_query: str) -> str: """Execute a SQL SELECT query. Use [Catalog].[Schema].[Table] format.""" result = _call_mcp("tools/call", {"name": "queryData", "arguments": {"query": sql_query}}) return _extract_text(result)

The ReAct agent calls these tools autonomously. The CDATA_CATALOG environment variable lets you skip catalog discovery for demos by returning a single known catalog name.

4.3 Schema Cache

The schema_cache.py module caches discovered schema metadata to ~/.cache/langgraph-health/schema.json with a configurable TTL (default 24 hours):

import json, time from pathlib import Path from config import SCHEMA_CACHE_TTL CACHE_FILE = Path.home() / ".cache" / "langgraph-health" / "schema.json" def is_valid(): """Check if cache exists and is within TTL.""" if not CACHE_FILE.exists(): return False return (time.time() - CACHE_FILE.stat().st_mtime) < SCHEMA_CACHE_TTL def load(): """Load cached schema data.""" return json.loads(CACHE_FILE.read_text()) def save(schema_data): """Save schema to cache.""" CACHE_FILE.parent.mkdir(parents=True, exist_ok=True) CACHE_FILE.write_text(json.dumps(schema_data, indent=2))

When the cache is valid, the gatherer agent injects the cached schema into its system prompt so it can skip discovery and start querying immediately. Use --refresh-schema to force re-discovery.

4.4 LangGraph Workflow

The agent uses a 3-node pipeline built with LangGraph's StateGraph:

  gather (ReAct Agent) ──> analyze (LLM Node) ──> render (Deterministic Node)
       |                   |                   |
  Discover schema,   Produce structured   Fill HTML template,
  query data via     JSON assessment      save styled brief
  MCP tool calls     (score, signals,     to output/
                      recommendations)
from langgraph.graph import StateGraph from state import AgentState from agents.gatherer import gather_node from agents.analyst import analyze_node from agents.renderer import render_node # Extensible pipeline -- add your agents to this list PIPELINE = [ ("gather", gather_node), ("analyze", analyze_node), ("render", render_node), ] def build_graph(): """Build and compile the LangGraph workflow.""" graph = StateGraph(AgentState) for name, func in PIPELINE: graph.add_node(name, func) graph.set_entry_point(PIPELINE[0][0]) for i in range(len(PIPELINE) - 1): graph.add_edge(PIPELINE[i][0], PIPELINE[i + 1][0]) return graph.compile()

The PIPELINE list makes it easy to add, remove, or reorder agents. Each node reads from and writes to the shared AgentState dictionary.

The gatherer node uses LangGraph's create_react_agent to create a ReAct loop:

from langgraph.prebuilt import create_react_agent from config import get_llm, CDATA_CATALOG from mcp_tools import get_catalogs, get_schemas, get_tables, get_columns, query_data import schema_cache TOOLS = [get_catalogs, get_schemas, get_tables, get_columns, query_data] def gather_node(state): """ReAct data gatherer -- discovers schemas and queries data.""" sys_prompt = "You are a data gathering agent..." # Inject cached schema if available if schema_cache.is_valid(): sys_prompt += f" Cached schema: {json.dumps(schema_cache.load())}" llm = get_llm() agent = create_react_agent(llm, TOOLS, prompt=sys_prompt) result = agent.invoke({"messages": [("user", state["user_prompt"])]}) return {"gathered_data": result["messages"][-1].content}

The ReAct agent decides which tools to call and in what order, adapting to whatever data source is connected.

4.5 Logger

The logger.py module provides a lightweight logger with custom formatting and run statistics. Use --verbose to see detailed output including MCP calls and timing:

[gatherer] 14:32:01 Schema cache hit
[analyst] 14:32:05 Analyzing gathered data
[renderer] 14:32:08 Brief saved to output/20260224_143208_premium_auto_health_brief.html
[summary] 14:32:08 --- Run Summary ---
[summary] 14:32:08 LLM calls: 4
[summary] 14:32:08 MCP calls: 12
[summary] 14:32:08 Total time: 7.23s

STEP 5: Run the Agent

5.1 Interactive Runner (Recommended First Time)

The easiest way to get started is the interactive runner. It handles credential setup, LLM provider selection, and running the agent through a menu-driven interface:

python run.py

The runner provides five options:

  1. Setup wizard — configure CData credentials and choose an LLM provider (OpenAI, Gemini, or DeepSeek) with model selection
  2. Run health analysis — analyze a specific account (with sample account suggestions)
  3. Run open-ended query — ask any question about your data (with sample query suggestions)
  4. Refresh schema cache — clear cached schemas for re-discovery
  5. Check setup — verify credentials, test MCP connection, and check dependencies

The rich library is auto-installed on first run if not already present.

5.2 Direct CLI: Account Health Analysis

Alternatively, run the agent directly from the command line:

python src/main.py --account "Premium Auto Group Europe"

Expected output:

  • The ReAct agent discovers schemas, queries accounts, opportunities, and tickets
  • The analyst node produces a health score with signals and recommendations
  • An HTML brief is saved to output/TIMESTAMP_AccountName_health_brief.html

5.3 Direct CLI: Open-Ended Query

Ask any question in plain English:

python src/main.py "Show me the top 10 customers by revenue"

The ReAct agent figures out which tables and queries to run. You can ask complex questions that span multiple tables:

python src/main.py "Which industries have the most high-priority open tickets?" --verbose

5.4 Verbose Mode

Add --verbose to see detailed agent output including tool calls and timing:

python src/main.py --account "Premium Auto Group Europe" --verbose

Here is a sample health brief generated by the agent:


STEP 6: Query Examples

Here are some example queries to explore the data:

Category Query
Revenue python src/main.py "Show me the top 10 customers by annual revenue"
Industry python src/main.py "All customers in the energy sector"
Pipeline python src/main.py "How many open opportunities do we have and total value"
Support python src/main.py "Show all high priority open tickets"
Segmentation python src/main.py "Customer count by industry"
Account Health python src/main.py --account "Premium Auto Group Europe"

STEP 7: Available MCP Tools

Your AI agent has access to these CData Connect AI tools:

ToolDescription
getCatalogsList available data source connections
getSchemasGet schemas for a specific catalog
getTablesGet tables in a schema
getColumnsGet column metadata for a table
queryDataExecute SQL queries
getProceduresList stored procedures
getProcedureParametersGet procedure parameter details
executeProcedureExecute stored procedures

Troubleshooting

Query Returned No Results

  • Verify the connection name in CData Connect AI is correct
  • Check that the table and column names exist using the Connect AI data explorer
  • Try a simpler query first: python src/main.py "Show me all customers"
  • Use --verbose to see the SQL queries the agent generates

LLM API Errors

  • Verify the OPENAI_API_KEY (or equivalent) is valid and has available credits
  • The agent works best with GPT-4o or Claude Sonnet. Set LLM_MODEL in your .env
  • For custom API endpoints, set OPENAI_API_BASE in your .env

Authentication Errors

  • Verify your CData email and PAT are correct in .env
  • Ensure the PAT has not expired
  • Check that your Connect AI account is active

Agent Loops Too Many Times

  • Set CDATA_CATALOG in .env to skip catalog discovery and narrow the agent's scope
  • Reduce MAX_ITERATIONS (default: 15) to limit tool-call loops
  • Use --verbose to see what the agent is doing at each step

Tool Calling Failures

  • Ensure the CData Connect AI instance has at least one active data source connected
  • Use fully qualified table names: [Catalog].[Schema].[Table]
  • Verify column names exist using the Connect AI data explorer

What's Next?

Now that you have a working customer health agent, you can:

  • Extend the agent pipeline: Add custom agents to the PIPELINE list in graph.py. For example, add a competitive analysis node, a churn prediction agent, or a financial modeling step between the analyst and renderer.
  • Connect more data sources: Add Salesforce, HubSpot, Snowflake, or any of 350+ supported sources through the CData Connect AI dashboard. The ReAct agent discovers schemas automatically.
  • Switch LLM providers: Change LLM_PROVIDER and LLM_MODEL in your .env to use Anthropic Claude, Google Gemini, or local models via Ollama.
  • Add scheduling: Run health analysis automatically on a schedule for proactive customer monitoring.
  • Add human-in-the-loop: LangGraph supports interrupt points where a human can review and approve the agent's actions before proceeding.
  • Explore advanced patterns: The LangGraph documentation covers cycles, branches, parallel execution, and multi-agent collaboration.

Resources


Get Started with CData Connect AI

Ready to build AI-powered data applications? CData Connect AI provides governed, secure access to 350+ enterprise data sources for AI applications. LangGraph agents can query live business data from Salesforce, Snowflake, HubSpot, Google Sheets, databases, and more through a single MCP interface.

Sign up for a free trial and start building intelligent customer health agents today!