How to Connect to Live Azure Data Lake Storage Data from Sourcegraph Amp (via CData Connect AI)

Somya Sharma
Somya Sharma
Technical Marketing Engineer
Integrate Sourcegraph Amp with CData Connect AI to query and manage live Azure Data Lake Storage data securely in real time.

Sourcegraph Amp is a modern AI agent environment designed for building intelligent, production-ready assistants capable of stateful reasoning, automatic context management, and native MCP (Model Context Protocol) integration. When combined with CData Connect AI, you can leverage Amp to create agents that interact with your Azure Data Lake Storage data in real time using natural language or SQL-based queries.

CData Connect AI provides a secure, cloud-to-cloud interface for accessing Azure Data Lake Storage data. Through the Connect AI Remote MCP Server, Amp connects directly to Azure Data Lake Storage, enabling live data queries and operations without replication. With optimized pushdown capabilities, CData Connect AI executes SQL operations including filters, aggregations, and joins directly in Azure Data Lake Storage for fast, real-time performance.

In this article, we demonstrate how to configure the Amp agent to conversationally explore your Azure Data Lake Storage data using natural language or SQL. With Connect AI, you can easily build agents that have secure, live access to Azure Data Lake Storage along with hundreds of other enterprise data sources.

Prerequisites

  1. An active CData Connect AI
  2. The Sourcegraph Amp VS Code extension or Amp CLI installed
  3. Node.js v20 or higher installed
  4. Access to Azure Data Lake Storage

Step 1: Configure Azure Data Lake Storage Connectivity for Sourcegraph Amp

Connectivity to Azure Data Lake Storage from Amp is made possible through CData Connect AI Remote MCP. To interact with Azure Data Lake Storage data from Amp, we start by creating and configuring a Azure Data Lake Storage connection in CData Connect AI.

  1. Log into Connect AI, click Sources, and then click Add Connection
  2. Select "Azure Data Lake Storage" from the Add Connection panel
  3. Enter the necessary authentication properties to connect to Azure Data Lake Storage.

    Authenticating to a Gen 1 DataLakeStore Account

    Gen 1 uses OAuth 2.0 in Entra ID (formerly Azure AD) for authentication.

    For this, an Active Directory web application is required. You can create one as follows:

    1. Sign in to your Azure Account through the .
    2. Select "Entra ID" (formerly Azure AD).
    3. Select "App registrations".
    4. Select "New application registration".
    5. Provide a name and URL for the application. Select Web app for the type of application you want to create.
    6. Select "Required permissions" and change the required permissions for this app. At a minimum, "Azure Data Lake" and "Windows Azure Service Management API" are required.
    7. Select "Key" and generate a new key. Add a description, a duration, and take note of the generated key. You won't be able to see it again.

    To authenticate against a Gen 1 DataLakeStore account, the following properties are required:

    • Schema: Set this to ADLSGen1.
    • Account: Set this to the name of the account.
    • OAuthClientId: Set this to the application Id of the app you created.
    • OAuthClientSecret: Set this to the key generated for the app you created.
    • TenantId: Set this to the tenant Id. See the property for more information on how to acquire this.
    • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.

    Authenticating to a Gen 2 DataLakeStore Account

    To authenticate against a Gen 2 DataLakeStore account, the following properties are required:

    • Schema: Set this to ADLSGen2.
    • Account: Set this to the name of the account.
    • FileSystem: Set this to the file system which will be used for this account.
    • AccessKey: Set this to the access key which will be used to authenticate the calls to the API. See the property for more information on how to acquire this.
    • Directory: Set this to the path which will be used to store the replicated file. If not specified, the root directory will be used.
  4. Click Save & Test

Step 2: Set Up Amp for CData Connect AI

Copy the MCP Endpoint

Amp communicates with Connect AI through the hosted MCP endpoint:

https://mcp.cloud.cdata.com/mcp

This endpoint provides secure, cloud-to-cloud communication between Amp and your Connect AI workspace.

Generate Base64 Credentials

To authenticate Amp with Connect AI, generate your Base64-encoded credentials. For example, in PowerShell:

{Convert}::ToBase64String{(Text.Encoding)}::ASCII.GetBytes("[email protected]:yourPAT")

Replace [email protected] with your Connect AI email and yourPAT with your Personal Access Token.

Register the MCP Server in Amp

Once you have your Base64 string, register the CData Connect AI MCP server with Amp using the following command:

amp mcp add cdata-connect-ai -- npx -y mcp-remote@latest https://mcp.cloud.cdata.com/mcp --header "Authorization: Basic "

This adds your Connect AI configuration to Amp's settings file, enabling communication with CData Connect AI.

Verify Your Connection and Explore Data

  1. Create a New Thread
  2. Start a new Amp session to begin interacting with your data:

    amp thread new

  3. Enter the Interactive Chat
  4. Connect to the new thread using:

    amp
    .

  5. Verify MCP Servers
  6. Inside the Amp shell, check your registered MCP servers:

    list mcp
    .

  7. Confirm Your Data Source
  8. Confirm that your connected Azure Data Lake Storage data appears as a catalog by running

    getCatalogs
    .

Step 3: Build Intelligent Agents with Live Azure Data Lake Storage Data Access

With your Amp application configured and connected to CData Connect AI, you can now build sophisticated agents that interact with your Azure Data Lake Storage data using natural language. The MCP integration provides your agents with powerful data access capabilities.

Available MCP Tools for your Agent

Your Amp application has access to the following CData Connect AI MCP tools:

  • getCatalogs: Lists all data source catalogs (e.g., ADLS1)
  • getSchemas: Returns database schemas within the connected catalog
  • getTables: Lists all tables and views available under a given schema
  • getColumns: Returns column definitions for a specific table or view
  • queryData: Executes SQL queries (SELECT, INSERT, UPDATE, DELETE)
  • getProcedures: Lists stored procedures or API endpoints
  • getProcedureParameters: Returns metadata for stored procedure parameters
  • executeProcedure: Invokes stored procedures (e.g., Azure Data Lake Storage actions)

Key Features of Amp

Amp provides several production-ready capabilities that make it ideal for building intelligent, data-aware AI agents:

  • Automatic Context Management: Amp maintains and recalls conversational context automatically, enabling seamless multi-turn interactions without manual state tracking.
  • Stateful Conversations: Preserve context and memory across multiple queries to create natural, human-like conversations.
  • Native MCP Integration: Amp natively supports the Model Context Protocol (MCP), allowing secure, real-time access to live data from CData Connect AI and other MCP-compatible servers.
  • Tool-Oriented Architecture: Tools are treated as first-class components with managed invocation, input validation, and error handling.
  • Efficient Context Handling: Amp optimizes prompts dynamically, ensuring relevant information is preserved even when approaching model token limits.
  • Cross-Source Querying: Combine and query multiple connected data sources within a single conversational workflow.
  • Fine-Grained Permission Controls: Define and enforce tool access levels to maintain data governance and secure integrations.
  • Developer-Friendly CLI and SDK: Manage MCP connections, configure agents, and test workflows easily from the Amp CLI or VS Code extension.

Example Use Cases

Here are some examples of what your Amp agents can do with live data access through CData Connect AI:

  • Data Analysis Agent: Identify trends and anomalies in Azure Data Lake Storage data.
  • Report Generation Agent: Generate reports from natural language prompts.
  • Interactive Chatbot: Explain insights conversationally using live data.
  • Data Quality Agent: Monitor and flag real-time data inconsistencies.
  • Automated Workflow Agent: Trigger alerts based on defined data conditions.

Testing Your Agent

Once your agent is running, you can interact with it through natural language queries. For example:

  • "Show me all new leads from the past 30 days."
  • "What are the top-performing campaigns this quarter?"
  • "Analyze revenue growth and highlight anomalies."
  • "Generate a summary report of current opportunities."
  • "Find all records where status is pending approval."

Get CData Connect AI

To get live data access to 300+ SaaS, Big Data, and NoSQL sources directly from your Amp agent environment, try CData Connect AI today!

Ready to get started?

Learn more about CData Connect AI or sign up for free trial access:

Free Trial