Integrating Dataiku with Paylocity Data via CData Connect AI
Dataiku is a collaborative data science and AI platform that enables teams to design, deploy, and manage machine learning and generative AI projects within a governed environment. It's Agent and GenAI framework allows users to build intelligent agents that can analyze, generate, and act on data through custom workflows and model orchestration.
By integrating Dataiku with CData Connect AI through the built-in MCP (Model Context Protocol) Server, these agents gain secure, real-time access to live Paylocity data. The integration bridges Dataiku's agent execution environment with CData's governed enterprise connectivity layer, allowing every query or instruction to run safely against authorized data sources without manual exports or staging.
This article demonstrates how to configure Paylocity connectivity in Connect AI, prepare a Python code environment in Dataiku with MCP support, and create an agent that queries and interacts with live Paylocity data directly from within Dataiku.
Step 1: Configure Paylocity Connectivity for Dataiku
Connectivity to Paylocity from Dataiku is made possible through CData Connect AI's Remote MCP Server. To interact with Paylocity data from Dataiku, you start by creating and configuring a Paylocity connection in CData Connect AI.
- Log into Connect AI, click Sources, then click Add Connection
- Select "Paylocity" from the Add Connection panel
-
Enter the necessary authentication properties to connect to Paylocity.
Set the following to establish a connection to Paylocity:
- RSAPublicKey: Set this to the RSA Key associated with your Paylocity, if the RSA Encryption is enabled in the Paylocity account.
This property is required for executing Insert and Update statements, and it is not required if the feature is disabled.
- UseSandbox: Set to true if you are using sandbox account.
- CustomFieldsCategory: Set this to the Customfields category. This is required when IncludeCustomFields is set to true. The default value for this property is PayrollAndHR.
- Key: The AES symmetric key(base 64 encoded) encrypted with the Paylocity Public Key. It is the key used to encrypt the content.
Paylocity will decrypt the AES key using RSA decryption.
It is an optional property if the IV value not provided, The driver will generate a key internally. - IV: The AES IV (base 64 encoded) used when encrypting the content. It is an optional property if the Key value not provided, The driver will generate an IV internally.
Connect Using OAuth Authentication
You must use OAuth to authenticate with Paylocity. OAuth requires the authenticating user to interact with Paylocity using the browser. For more information, refer to the OAuth section in the Help documentation.
The Pay Entry API
The Pay Entry API is completely separate from the rest of the Paylocity API. It uses a separate Client ID and Secret, and must be explicitly requested from Paylocity for access to be granted for an account. The Pay Entry API allows you to automatically submit payroll information for individual employees, and little else. Due to the extremely limited nature of what is offered by the Pay Entry API, we have elected not to give it a separate schema, but it may be enabled via the UsePayEntryAPI connection property.
Please be aware that when setting UsePayEntryAPI to true, you may only use the CreatePayEntryImportBatch & MergePayEntryImportBatchgtable stored procedures, the InputTimeEntry table, and the OAuth stored procedures. Attempts to use other features of the product will result in an error. You must also store your OAuthAccessToken separately, which often means setting a different OAuthSettingsLocation when using this connection property.
- RSAPublicKey: Set this to the RSA Key associated with your Paylocity, if the RSA Encryption is enabled in the Paylocity account.
- Click Save & Test
- Open the Permissions tab and set user-based permissions
Add a Personal Access Token
A Personal Access Token (PAT) is used to authenticate the connection to Connect AI from Dataiku. It is best practice to create a separate PAT for each integration to maintain granular access control
- Click the gear icon () at the top right of the Connect AI app to open Settings
- On the Settings page, go to the Access Tokens section and click Create PAT
- Give the PAT a descriptive name and click Create
- Copy the token when displayed and store it securely. It will not be shown again
With the Paylocity connection configured and a PAT generated, Dataiku can now connect to Paylocity data through the CData MCP Server.
Step 2: Prepare Dataiku and the Code Environment
A dedicated python code environment in Dataiku provides the runtime support needed for MCP-based communication. To enable Dataiku Agents to connect to CData Connect AI, create a Python environment and install the MCP client dependencies required for agent-to-server interaction.
- In Dataiku Cloud, open Code Envs
- Click Add a code env to open the DSS settings window
- In DSS, click New Python env. Name it (for example, MCP_Package) and choose Python 3.10 (3.10 to 3.13 supported)
- Open Packages to install and add the following pip packages:
- httpx
- anyio
- langchain-mcp-adapters
- Open Containerized execution and under Container runtime additions select Agent tool MCP servers support
- Check Rebuild env and click Save and update to install packages
- Back in Dataiku Cloud, open Overview and click Open instance
- Click + New project and select Blank project. Name the project
Step 3: Create a Dataiku Agent and connect to the MCP server
The Dataiku Agent serves as the bridge between the Dataiku workspace and the CData MCP Server. To enable this connection, create a custom code-based agent, assign it the configured Python environment, and embed your Connect AI credentials to allow the agent to query and interact with live Paylocity data.
- Go to Agents & GenAI Models and click Create your first agent
- Choose Code agent, name it, and for Agent version select Asynchronous agent without streaming
- From the tab above select Settings. In Code env selection set Default Python code env to the environment you created (for example, MCP_Package)
- Return to the Agent Design tab and paste the following code. Replace EMAIL, and PAT with your values
- Open Quick Test on the right side panel
- Paste the JSON code and click Run test
import os
import base64
from typing import Dict, Any, List
from dataiku.llm.python import BaseLLM
from langchain_mcp_adapters.client import MultiServerMCPClient
# ---------- Persistent MCP client (cached between calls) ----------
_MCP_CLIENT = None
def _get_mcp_client() -> MultiServerMCPClient:
"""Create (or reuse) a MultiServerMCPClient to CData Cloud MCP."""
global _MCP_CLIENT
if _MCP_CLIENT is not None:
return _MCP_CLIENT
# Set creds via env/project variables ideally
EMAIL = os.getenv("CDATA_EMAIL", "YOUR_EMAIL")
PAT = os.getenv("CDATA_PAT", "YOUR_PAT")
BASE_URL = "https://mcp.cloud.cdata.com/mcp"
if not EMAIL or PAT == "YOUR_PAT":
raise ValueError("Set CDATA_EMAIL and CDATA_PAT as env variables or inline in the code.")
token = base64.b64encode(f"{EMAIL}:{PAT}".encode()).decode()
headers = {"Authorization": f"Basic {token}"}
_MCP_CLIENT = MultiServerMCPClient(
connections={
"cdata": {
"transport": "streamable_http",
"url": BASE_URL,
"headers": headers,
}
}
)
return _MCP_CLIENT
def _pick_tool(tools, names: List[str]):
L = [n.lower() for n in names]
return next((t for t in tools if t.name.lower() in L), None)
async def _route(prompt: str) -> str:
"""
Simple intent router:
- 'list connections' / 'list catalogs' -> getCatalogs
- 'sql: ...' or 'query: ...' -> queryData
- otherwise -> help text
"""
client = _get_mcp_client()
tools = await client.get_tools()
p = prompt.strip()
low = p.lower()
# 1) List connections (catalogs)
if "list connections" in low or "list catalogs" in low:
t = _pick_tool(tools, ["getCatalogs", "listCatalogs"])
if not t:
return "No 'getCatalogs' tool found on the MCP server."
res = await t.ainvoke({})
return str(res)[:4000]
# 2) Run SQL
if low.startswith("sql:") or low.startswith("query:"):
sql = p.split(":", 1)[1].strip()
t = _pick_tool(tools, ["queryData", "sqlQuery", "runQuery", "query"])
if not t:
return "No query-capable tool (queryData/sqlQuery) found on the MCP server."
try:
res = await t.ainvoke({"query": sql})
return str(res)[:4000]
except Exception as e:
return f"Query failed: {e}"
# 3) Help
return (
"Connected to CData MCP
"
"Say **'list connections'** to view available sources, or run a SQL like:
"
" sql: SELECT * FROM [Salesforce1].[SYS].[Connections] LIMIT 5
"
"Remember to use bracket quoting for catalog/schema/table names."
)
class MyLLM(BaseLLM):
async def aprocess(self, query: Dict[str, Any], settings: Dict[str, Any], trace: Any):
# Extract last user message from the Quick Test payload
prompt = ""
try:
prompt = (query.get("messages") or [])[-1].get("content", "")
except Exception:
prompt = ""
try:
reply = await _route(prompt)
except Exception as e:
reply = f"Error: {e}"
# The template expects a dict with a 'text' key
return {"text": reply}
Run a Quick Test
{
"messages": [
{
"role": "user",
"content": "list connections"
}
],
"context": {}
}
Chat with your Agent
Switch to the Chat tab and try prompting like, "List all connections". The chat output will show a list of connection catalogs.
Get CData Connect AI
To access 300+ SaaS, Big Data, and NoSQL sources from your AI agents, try CData Connect AI today.