Access Live Data in AWS Batch Using CData JDBC Drivers



AWS Batch is a fully managed service that runs containerized workloads at scale—without provisioning or managing the underlying compute infrastructure. For data teams, it’s a reliable way to schedule recurring extractions, run nightly sync jobs, or trigger data pipelines on demand.

The CData JDBC Driver for Salesforce gives Java applications standard SQL access to live Salesforce objects. Instead of writing against the Salesforce REST API directly—handling authentication, pagination, and field mapping yourself—you connect with a JDBC URL and query Accounts, Opportunities, Cases, or any other object using familiar SQL syntax.

In this article, we walk through writing a Java program that queries Salesforce data using the CData JDBC Driver for Salesforce, packaging it into a Docker container, pushing the image to Amazon Elastic Container Registry (ECR), and running it as a scheduled batch job on AWS Batch.

Prerequisites

You need the following before getting started:

  1. CData JDBC Driver for Salesforce (includes cdata.jdbc.salesforce.jar and cdata.jdbc.salesforce.lic)
  2. Java Development Kit (JDK) 17 or later
  3. Docker Desktop or Docker Engine, installed and running
  4. AWS CLI, installed and configured with IAM credentials
  5. An AWS account with permissions to create ECR repositories, Batch compute environments, job queues, and job definitions
  6. A Salesforce account with API access and a valid Security Token

Overview

Here’s a quick overview of the steps:

  1. Write a Java program that connects to Salesforce via JDBC and exports results to CSV
  2. Compile the program and build a Docker image
  3. Push the image to Amazon ECR
  4. Configure AWS Batch and submit the job

Step 1: Write the Java program

The program connects to Salesforce using the CData JDBC Driver, runs a SQL query, and writes the results to a CSV file. Both cdata.jdbc.salesforce.jar and cdata.jdbc.salesforce.lic must be in the same working directory as the compiled class.

import java.sql.*;
import java.io.FileWriter;

public class SalesforceBatchJob {

    public static void main(String[] args) {

        String url = "jdbc:salesforce:"
            + "SecurityToken=your_security_token;"
            + "User=your_username;"
            + "Password=your_password;"
            + "APIVersion=64.0;"
            + "AuthScheme=Basic;"
            + "UseSandbox=false;"
            + "RTK=your_rtk_key;";

        try {
            System.out.println("Connecting to Salesforce...");
            Connection conn = DriverManager.getConnection(url);
            System.out.println("Connection successful.");

            String query = "SELECT Id, Name FROM Account ORDER BY Name LIMIT 10";
            Statement stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(query);

            FileWriter writer = new FileWriter("output.csv");
            writer.write("Id,Name\n");

            while (rs.next()) {
                String id = rs.getString("Id");
                String name = rs.getString("Name");
                writer.write(id + "," + name + "\n");
                System.out.println(id + " - " + name);
            }

            writer.close();
            conn.close();
            System.out.println("Job completed successfully.");

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Key connection properties used in the JDBC URL:

Property Description
SecurityToken Your Salesforce security token, used alongside your password for API authentication.
User Your Salesforce login email address.
Password Your Salesforce account password.
APIVersion The Salesforce API version to target (e.g., 64.0).
AuthScheme Set to Basic for username + password + security token authentication.
UseSandbox Set to false for production, true for sandbox environments.
RTK Your CData runtime key, included with your driver license. See your license documentation for details.

Step 2: Compile the program

Compile the Java source file with the CData JDBC JAR on the classpath. Run the following from the directory containing both the source file and the JAR:

# Compile the Java program
javac -cp cdata.jdbc.salesforce.jar SalesforceBatchJob.java

# To target a specific Java version (e.g., Java 17)
javac --release 17 -cp cdata.jdbc.salesforce.jar SalesforceBatchJob.java

This produces SalesforceBatchJob.class in the same directory.


Step 3: Create the Dockerfile

Create a file named Dockerfile (no extension) in the same directory as the compiled class, the JAR, and the license file. The Dockerfile packages the application into a container image based on Eclipse Temurin JDK 17.

FROM eclipse-temurin:17-jdk-jammy

WORKDIR /app

COPY SalesforceBatchJob.class .
COPY cdata.jdbc.salesforce.jar .
COPY cdata.jdbc.salesforce.lic .

CMD ["java", "-cp", ".:cdata.jdbc.salesforce.jar", "SalesforceBatchJob"]

NOTE: The cdata.jdbc.salesforce.lic file must be copied into the container. Without it, the driver won’t initialize. Make sure all three files are in the same directory before building.


Step 4: Build and test the Docker image locally

Build the image and run a local test before pushing to ECR.

# Build the Docker image
docker build -t salesforce-batch-job .

# Test the image locally
docker run salesforce-batch-job

If the container prints Salesforce Account records to the console and exits cleanly, the image is ready to deploy. Fix any connection errors before proceeding.

Local docker run output showing a successful Salesforce connection and Account records

Step 5: Create an Amazon ECR repository

  1. Open the Amazon ECR console.
  2. Click Create repository.
  3. Enter a repository name—for example, salesforce-batch-job.
  4. Leave the default settings and click Create repository.
  5. Copy the Repository URI from the confirmation page. It follows this format: your-account-id.dkr.ecr.your-region.amazonaws.com/salesforce-batch-job.
The salesforce-batch-job repository in the Amazon ECR private registry

Step 6: Push the Docker image to ECR

Authenticate your local Docker client to your ECR registry, then tag and push the image. Replace the placeholder values with your AWS account ID and region.

# Authenticate Docker to your ECR registry
aws ecr get-login-password --region your-region |
  docker login --username AWS `
    --password-stdin your-account-id.dkr.ecr.your-region.amazonaws.com

# Tag the image for ECR
docker tag salesforce-batch-job:latest `
  your-account-id.dkr.ecr.your-region.amazonaws.com/salesforce-batch-job:latest

# Push the image to ECR
docker push `
  your-account-id.dkr.ecr.your-region.amazonaws.com/salesforce-batch-job:latest

You’ll need an AWS Access Key ID and Secret Access Key with ECR push permissions before running these commands.


Step 7: Configure AWS Batch

AWS Batch uses three resources to run a containerized job: a compute environment, a job queue, and a job definition. Create them in that order.

Create a compute environment

  1. In the AWS Batch console, navigate to Compute environments and click Create.
  2. Select Managed as the compute environment type.
  3. Configure instance type, vCPU, and memory settings for your workload.
  4. Click Create compute environment.
Reviewing the managed Fargate compute environment configuration before creating it

Create a job queue

  1. Navigate to Job queues and click Create.
  2. Give the queue a name and associate it with the compute environment you just created.
  3. Click Create.

Create a job definition

  1. Navigate to Job definitions and click Create.
  2. Set the Image field to your ECR Repository URI—for example, your-account-id.dkr.ecr.your-region.amazonaws.com/salesforce-batch-job.
  3. Set vCPU and memory limits appropriate for the Salesforce query workload.
  4. Click Create.
Setting the ECR image URI in the AWS Batch job definition container configuration

Submit the job

  1. Navigate to Jobs and click Submit new job.
  2. Select the job definition and job queue you created.
  3. Click Submit.

AWS Batch provisions a compute instance, pulls the image from ECR, runs the container, and terminates the instance when the job completes. Check the job status in the console to confirm a successful run.

The SalesforceJobBatch job showing a Succeeded status in the AWS Batch console

Schedule recurring jobs with Amazon EventBridge

A one-time submission is useful for testing, but most teams need this to run on a schedule—nightly extractions, hourly syncs, or end-of-day reporting jobs. Amazon EventBridge Scheduler lets you trigger AWS Batch jobs on a cron schedule without any additional infrastructure.

  1. Open the Amazon EventBridge Scheduler console and click Create schedule.
  2. Choose Recurring schedule and set your cron expression—for example, cron(0 2 * * ? *) to run daily at 2 a.m. UTC.
  3. For the target, select AWS BatchSubmitJob.
  4. Specify the job definition and job queue you created in Step 7.
  5. Assign an IAM role that grants EventBridge permission to submit Batch jobs, then save.

Once active, EventBridge fires the Batch job on your defined schedule. No servers to keep running between executions, and no manual triggers required.


Start querying live Salesforce data

The CData JDBC Driver for Salesforce turns any Java application—batch jobs, ETL pipelines, or microservices—into a live, SQL-based client for your Salesforce data. No REST calls to maintain, no pagination logic, no manual field mapping.

Download a free trial of the CData JDBC Driver for Salesforce and start querying live data in minutes.


Related resources


Questions or running into issues? Reach out to the CData support team at [email protected]—or explore the full library of CData JDBC Drivers to connect your Java applications to hundreds of data sources using the same approach.