From Jobs to Workflows: Why Data Teams Needed Pipelines in CData Sync

by Andrew Petersen | January 20, 2026

Pipelines in CData Sync Data teams have made enormous progress in how they move data. Connecting systems, replicating changes, and landing data in warehouses and lakehouses is no longer the hard part.

CData Sync has long been built for that core job, providing a reliable foundation for moving data across systems and keeping analytics and operations aligned.

As those data movement processes become more central to an organization, coordination matters alongside connectivity. Ingestion feeds reporting, transformations shape metrics, and curated results flow back into operational systems.

There’s growing value in treating those steps as a single workflow. That evolution highlights a gap between data replication and full-scale orchestration.

A market gap between replication and orchestration

The data integration market has long offered two extremes.

On one end are replication-focused tools that intentionally stop at ingestion. They assume orchestration happens elsewhere. On the other end are full-featured orchestration platforms that provide enormous flexibility, along with additional infrastructure, engineering effort, and operational overhead.

For many data teams, especially those operating hybrid or on-prem environments, neither option is ideal. They don’t need a general-purpose workflow engine. They need a reliable way to coordinate data movement workflows without introducing another platform to deploy, secure, and maintain.

We filled this gap with Pipelines.

Turning jobs into a single workflow

Pipelines defines a single workflow that coordinates jobs, transformations, reverse ETL, and events.

A pipeline defines the order in which steps run, how failures and skips are handled, and when downstream work should execute. Each pipeline run is tracked as one execution, with a clear start time, outcome, and step-level detail.

Taken together, this scope is deliberate. Pipelines isn’t a general-purpose orchestration platform, and it doesn’t replace tools like Airflow or Dagster. It brings orchestration for data movement workflows directly into Sync.

Real-world example: Sales operations without external orchestration

Consider a sales operations team responsible for keeping revenue metrics current throughout the day.

Their data flows:

Salesforce provides CRM data via incremental replication
SQL Server provides orders and invoices via CDC
A cloud data warehouse stores reporting and analytics tables
Salesforce receives curated metrics for sales and management views

Before Pipelines, each of these steps existed in Sync, but coordinating them required external scheduling and glue code.

With Pipelines, the team defines a single sales operations pipeline.

The pipeline starts with incremental replication from Salesforce and CDC ingestion from SQL Server. As soon as those steps complete, a transformation runs to calculate revenue and pipeline metrics. That transformation is configured to run only when upstream data has changed, avoiding unnecessary downstream work.

Once metrics are updated, a reverse ETL step pushes the latest values back into Salesforce. Finally, a lightweight event records pipeline completion and surfaces failures if needed.

All as part of a single pipeline execution.

See exactly what ran, end to end

The most noticeable change shows up in day-to-day operations.

Data teams ask one question: did the pipeline succeed?

Each pipeline run has a clear start time, execution order, and outcome. Teams can see exactly which steps ran, which were skipped, and where a failure occurred. Troubleshooting is faster because each pipeline run shows exactly what happened.

Because downstream steps only run when data actually changes, warehouse compute and API calls are reduced. End-to-end pipelines complete sooner as downstream steps run immediately, and unnecessary work is avoided.

Most importantly, orchestration logic now lives in the same environment as data movement. There’s no additional infrastructure to manage, no separate system to secure, and no split ownership between tools.

Unified execution visibility

Pipelines introduces a single execution context for data workflows in Sync.

Every pipeline run captures:

When execution started
Which steps ran and in what order
Which steps were skipped
The status and duration of each step

This unified run history gives data teams a clear, end-to-end view of what happened. It also simplifies auditing and operational reviews by providing one authoritative record for each workflow run.

Bringing coordination into Sync

Pipelines brings lightweight workflow coordination directly into CData Sync, allowing teams to build and operate end-to-end data pipelines with fewer moving parts and clearer operational visibility.

Pipelines is available as part of the CData Sync Q1 2026 release. To see how Pipelines fits alongside other recent enhancements, read the Sync release overview blog.

Try CData Sync free

Download your free 30-day trial to see how CData Sync delivers seamless integration

Get the trial

Data Management CData Sync

CData is the data layer that makes AI work in production—live connectivity and replication across 350+ sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog