The Architecture Behind Token-Efficient Enterprise Claude Workflows

by Mohammed Mohsin Turki | June 1, 2026

Token usage reduction with Connect AI. The cost of enterprise AI scales with every query. As more teams and departments run workflows against live data, token overhead compounds at every layer: tool definitions, discovery chains, and multi-source round-trips. A single federated query across multiple enterprise systems can consume thousands of tokens before returning the answer. The right architecture choice determines whether these token costs remain predictable as adoption grows or outpace the value AI delivers.

CData Connect AI is built to handle enterprise AI at every level of determinism. When schemas are unknown, queries are undefined, and data sources are still being evaluated, the Universal MCP tools give Claude the freedom to discover connections, inspect data and data models, and reason across sources to return answers nobody pre-specified.

Once workflows harden and patterns stabilize, Connect AI optimization features take over: scoping tool definitions, pre-joining data, and caching results so the same query runs at a fraction of the token cost. Organizations move from exploration to production without switching platforms.

We ran an internal benchmark to measure exactly how much that architectural choice matters in practice. The query: a federated enterprise prompt spanning Salesforce, Snowflake, and ServiceNow through CData Connect AI. The result: a 97.6% reduction in token usage. From 183,541 tokens at $0.596 per query down to 4,427 tokens at $0.027. Validated across 56 independent runs using Claude Sonnet 4.6 at temperature=0.

If you want to try the benchmark yourself, read the how-to guide in our Knowledge Base.

In this article, we break down how best to optimize tokens across your Enterprise AI workflows, which Connect AI features configurations drive that reduction, and how to replicate the same in your environment.

Why token usage is an enterprise cost driver

Per Anthropic’s documentation, a token is the atomic unit of text an LLM consumes and produces. A token in Claude is roughly three to four characters of text, or about three-quarters of a word. Every character in a system prompt, tool definition, query, and response counts toward the bill. Multi-source queries amplify this overhead at every layer.

Three forces inflate token cost per call:

Tool definition overhead. Each connected source adds its tool schema to the request. Anthropic’s engineering team measured five-server MCP setups consuming around 55,000 tokens before a conversation begins. Their Tool Search feature reduced that footprint by up to 85% by deferring most tool loading.
Round-trip accumulation. Each tool call adds a tool_use block plus a tool_result block. By turn five of a multi-source discovery chain, the history has grown by tens of thousands of tokens, all re-sent on every subsequent turn.
Schema bloat. A default Salesforce Account search tool exposes more than 70 fields. Most queries need a handful. Unused fields still consume tokens on every call.

The right architecture choice decides whether the bill grows linearly with adoption or stays manageable.

The autonomy spectrum: where your workflow sits determines your architecture

We have established that AI agent workflows aren’t all the same, but they do all fall along a spectrum of autonomy. Some are on the exploratory, non-deterministic end, where the token overhead is worth the cost to gain flexibility. The features benchmarked in this article are for the other, deterministic end of the spectrum: workflows where the pattern is established, the sources are known, and the same query runs on a schedule. These are the kinds of workflows that will generally be run repeatedly, with little or no human involvement, so token conservation is even more important. If you'd like to learn more about the spectrum, we covered it in our blog: The Autonomy Spectrum: What Agents Need at Every Level.

The benchmark scenario

The test query mirrors a realistic enterprise prompt: “Show me open support tickets, the related Salesforce accounts, and the product telemetry data from Snowflake for enterprise customers. Return up to 50 rows.”

This is a non-deterministic, federated query spanning three independent enterprise systems in a single agent turn — an ITSM tool, a CRM, and a data warehouse. Each scenario hits POST /v1/messages/count_tokens on the Anthropic Messages API; token counts are read straight off the response usage object, per call, per turn.

Parameter	Value
Data sources	Salesforce (CRM), Snowflake (warehouse), ServiceNow (ITSM)
Connected via	CData Connect AI managed MCP
CData MCP Tools	`getInstructions`, `getCatalogs`, `getSchemas`, `getTables`, `getColumns`, `getProcedures`, `getProcedureParameters`, `queryData`, `executeProcedure`, `execute_insert`, `execute_update` (11 universal tools)
Model	claude-sonnet-4-6
Pricing reference	$3.00 / MTok input, $15.00 / MTok output (Anthropic list price)
Token measurement	Captured from `usage.input_tokens` and `usage.output_tokens` on the Anthropic Messages API response, per call, per turn
Run methodology	Real multi-turn execution against live MCP — Claude plans, executes tool calls, ingests results, repeats until `end_turn`
Run counts	4 to 16 independent runs per scenario at `temperature=0`
Aggregation	Median for high-variance scenarios (Raw baseline, Derived Views), mean for deterministic scenarios

The diagram below contrasts the two paths. The baseline routes Claude through a discovery chain — getCatalogs → getInstructions → getSchemas → getTables → getColumns, three separate queryData calls before a final synthesis turn. This is the universal tool path: fully exploratory, appropriate for unknown schemas, and deliberately open-ended. The optimized path collapses everything into a single Custom Tool invocation, with no discovery overhead. It’s the right architecture once the workflow has stabilized.

For workflows that are exploratory but recurring, Workspaces and Derived Views offer a middle path — scoping what Claude sees and pre-joining sources without locking the workflow into a fixed pattern.

Claude token reduction benchmark diagram.

Token flow comparison: raw baseline vs optimized path.

The numbers: up to 97.6% token reduction using Connect AI features

Each Connect AI feature was measured independently against the same exploratory baseline — the universal tool path — to show what becomes possible when a workflow moves toward the deterministic end of the spectrum. The summary table:

Baseline: 183,541 tokens - 22 tool calls - $0.596 per query

Feature	Tokens	Tool Calls	Cost / Query	Savings
Derived Views	40,983	4	$0.146	- $0.45
Workspaces	11,713	3	$0.049	- $0.55
Jobs / Caching	19,778	2	$0.075	- $0.52
Custom Tools	4,427	1	$0.027	- $0.57
Toolkits	16,384	3	$0.063	- $0.53
Combined (all features)	11,791	3	$0.049	- $0.55

What this means for your organization:

Custom Tools cut tokens by 97.6%. A scoped tool definition cuts per-query cost and keeps spend predictable as workloads scale.
Workspaces, Toolkits, Caching, and the Combined stack each deliver 89–94% reductions. Pick the feature that matches your workflow — the cost outcome is similar.
Derived Views land at 77.7%. Pre-joining cross-source data eliminates orchestration work for every agent that runs the same query.
Per-query cost drops from $0.596 to $0.027 — about $57,000 in monthly token spend removed at 100,000 queries.

Estimated cost per query (USD) per feature

Two further multipliers compound on top:

Batch processing discounts API spend by an additional 50% for asynchronous workloads.
Prompt caching reduces cached-input cost by up to 90% for identical context across calls.

How each Connect AI feature reduces the tokens

Each feature removes a specific category of overhead before the request reaches Claude. The mechanics differ; the underlying lever is the same — once the workflow is known, configure it once and deploy Claude faster. These features are designed for workflows that have moved past the exploratory phase and are running on a defined, repeatable pattern.

Derived Views: pre-join multi-source data

Derived Views are reusable virtual tables defined with a defined SQL statement. They encapsulate joins, filters, and transformations across one or more sources so the AI only sees the finished result. The benchmark view pre-joins ServiceNow incidents with Salesforce accounts and Snowflake telemetry.

Measured: 40,983 tokens (median of 16 runs), 77.7% reduction. Three source schemas collapse into one, and the multi-call orchestration disappears. Wall-clock drops from 242.8s to roughly 50s per query.

Workspaces: scope the data catalog Claude sees

Workspaces act as a data catalog inside Connect AI. They organize tables, views, and Derived Views into named groups and generate dedicated endpoints (REST, MCP, OData, OpenAPI), so each AI agent sees only the assets the workspace owner published. The benchmark workspace exposes three relevant Custom Tools and nothing else.

Measured: 11,713 tokens (mean of 6 runs, 0.1% variance), 93.6% reduction. Scoping the tool list also prevents Claude from exploring unrelated sources — a governance win alongside the cost win.

Jobs and Caching: pre-fetch and reuse results

Connect AI Jobs let admins select tables to cache to a managed PostgreSQL store on a recurring schedule. Once cached, queries hit the local copy instead of the live source, so Claude gets fast responses without paying the discovery + live-fetch round-trip.

Measured: 19,778 tokens (mean of 4 runs, 0.0% variance), 89.2% reduction. Best fit for high-frequency recurring queries where hourly or daily freshness is acceptable.

Custom Tools: expose only the schema Claude needs

Custom SQL Tools are admin-defined parameterized SQL templates inside a Toolkit. Each one exposes a precisely-scoped query — only the fields, filters, and parameters the workflow needs — as a named tool the AI can call directly. The benchmark Custom Tool replaces the default 73-column Salesforce Account schema with a six-column scoped definition.

Measured: 4,427 tokens (mean of 6 runs, 0.8% variance), 97.6% reduction — the strongest single-feature optimization in the benchmark. Wall-clock per query: 19s. One tool call, right-shaped data, answer.

Toolkits: curated tool bundles per workflow

Toolkits bundle Custom Tools and connection tools into a named container with its own MCP endpoint. Where Workspaces scope at the data catalog level, Toolkits scope at the workflow level — sales toolkit, finance toolkit, support toolkit.

Measured: 16,384 tokens (mean of 6 runs, 0.0% variance), 91.1% reduction. The benchmark Toolkit bundles three source connections and three Custom Tools into 16 tools total. Claude consistently picks the named Custom Tools over universal discovery tools.

How to replicate this in your enterprise AI workflows

The right starting point depends on where your workflow sits on the autonomy spectrum. If you’re still in the exploratory phase — evaluating data sources, prototyping queries, or handling ad-hoc analysis — the universal tool path is appropriate. Let Claude discover, reason, and iterate. The token overhead is the cost of genuine flexibility.

Once a workflow has stabilized into a repeatable pattern, the features below layer naturally. The fastest path from exploratory to production-ready:

Start with Workspaces. Scope the data catalog to what each Claude session actually needs — expose three tools instead of three hundred.
Apply Custom Tools to high-use connectors. Trim each schema to the fields the workflow reads. The Salesforce Account example shrinks the tool definition by roughly 80%.
Build Derived Views for recurring multi-source joins. Pre-join server-side instead of asking Claude to orchestrate three calls every time.
Schedule Jobs for non-real-time data. Any query where hourly freshness is acceptable belongs in the cache.
Group these into Toolkits per workflow. Sales, finance, and support each get their own named Toolkit with a dedicated MCP endpoint — so each Claude session only connects to the tools it needs.

The benchmark numbers above show what's available when the data layer is configured to support it. Scoped catalogs, pre-joined views, cached results, and parameterized tools don't just reduce token costs; they make enterprise AI predictable enough to scale. For organizations running AI at volume, the right data layer is where token efficiency and cost control are actually won.

Total tokens per query: raw baseline versus each Connect AI feature.

Frequently asked questions

What causes high token usage when Claude queries enterprise data?

Three drivers dominate: tool definition lists for many connected sources, multi-step chains where Claude issues sequential discovery and queries, and verbose raw API responses on each tool result. Architectural choices have a larger impact than prompt wording.

Does token optimization affect the quality of Claude's answers?

When applied well, it improves answer quality by reducing noise. Scoped tool lists and pre-joined Derived Views give Claude less irrelevant information to reason over.

How do I measure token usage in my current Claude setup?

Use the Anthropic token counting endpoint (POST /v1/messages/count_tokens) to size requests before sending them. Output tokens are reported in the usage field of every response.

Which Connect AI feature delivers the largest saving?

Custom Tools produced the largest single-feature reduction at 97.6%. The right answer depends on workflow: teams with many connected sources gain most from Workspaces and Toolkits; teams running frequent cross-source joins gain most from Derived Views; teams with stable repeat workflows gain most from Caching and AI Skills.

When should I keep using universal tools instead of optimizing?

Universal tools are the right choice when workflows are genuinely exploratory — schema discovery, ad-hoc analysis, prototyping, or any task where the query shape isn't known in advance. The optimization features in this benchmark are designed for workflows that have stabilized into repeatable patterns. If the pattern isn't fixed yet, let Claude explore freely and optimize once it is. For more on how to assess where a workflow falls, see The Autonomy Spectrum: What Agents Need at Every Level.

Do these patterns apply to other LLMs?

The architectural principles apply to any token-priced LLM. Connect AI's MCP design means the same features work with any MCP-compatible client — Claude, Microsoft Copilot, Cursor, n8n, and others.

Token reduction at every query with CData Connect AI

Architecture decides whether enterprise AI scales economically — and the right architecture depends on the workflow. Universal tools give agents the freedom to explore. Workspaces, Toolkits, Custom Tools, and Derived Views give production workflows the efficiency to scale. CData Connect AI compresses multi-source integration into a single governed MCP endpoint — hundreds of connectors, identity-first security, and the features above, all configured once and reused across every Claude workflow.

Start a free trial to run the same benchmark against your enterprise data.

Your enterprise data, finally AI-ready.

Connect AI gives your AI assistants and agents live, governed access to hundreds of enterprise systems — so they can reason over your actual business data, not just what they were trained on.

Get The Trial

Solutions & Use Cases CData Connect AI

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog