CData Sync

Transform Data In-Flight for Cleaner, Analytics-Ready Data

Shape and standardize data as it moves between systems to reduce downstream modeling and cloud compute. CData Sync applies business logic and SQL-based transformations consistently across Snapshot, Delta Snapshot, Change Data Capture (CDC), and reverse ETL jobs, giving teams full control how data is delivered and stored.

Explore Transformations

Try Sync Free

Understanding transformations

Transformations in Sync reduce the need for separate data-preparation pipelines and lower cloud compute by applying business logic, filtering, and standardization as data moves into analytics and operational systems. These transformations are applied consistently across replication types, including Snapshot, Delta Snapshot, and CDC.

Why teams use transformations

Normalize schemas across ERP, CRM, and operational systems so downstream analytics start with consistent, usable table

Filter high-volume data at the source to reduce storage and compute consumption in cloud data warehouses

Prepare analytics-ready datasets during ingestion rather than relying on downstream modeling tools

Enrich reverse ETL syncs with calculated fields that improve CRM and ERP process automation

Apply governance rules,such as masking or hashing, before data reaches cloud platforms to reduce exposure risk

Use the same transformation logic across Snapshot, Delta Snapshot, CDC, and reverse ETL jobs to reduce maintenance and avoid duplicated SQL

How transformations work

Column expressions

Create cleaner, more usable data during ingestion by applying SQL logic directly within the replication process.

Apply SQL expressions during replication:

Create derived metrics (for example, total_cost = qty * unit_price)
Normalize date and time-zone adjustments
Implement conditional logic using CASE expressions (for example, CASE WHEN…)
Employ hashing or masking for governance of personally identifiable information (PII)

Row and column filtering

Reduce storage, compute, and processing overhead by filtering data early in the replication pipeline.

Keep only the data that matters:

Exclude inactive rows or archived history
Reduce wide source tables (700+ columns) to only the curated fields that are needed downstream
Limit high-volume CDC or reverse ETL workloads to the specific data slices that are required

Joins, lookups, and enrichment

Eliminate downstream modeling steps by enriching datasets before they land in your warehouse or operational systems. Sync supports upstream enrichment by joining reference tables that already exist in the source system.

Common examples:

Finance: Join cost-center tables during ERP warehouse loads
Retail: Merge location metadata into point-of-sale feeds.
Manufacturing / Energy: Attach asset metadata to high-frequency equipment readings

Schema remapping

Standardize and clean inbound data so that pipelines land in a predictable, analytics-ready form without manual cleanup.

Sync can perform these mapping functions:

Rename columns
Reorder fields
Standardize naming conventions, such as snake_case or camelCase
Convert datatypes for warehouse or SaaS compatibility

Where transformations apply

Apply a single transformation definition across all replication styles to reduce pipeline sprawl, eliminate duplicative SQL, and ensure consistent logic from ingestion through operational syncs.

Snapshot and Delta Snapshot replication

Transformations run as part of Sync's SQL-based change-detection engine, which uses the EXCEPT and MINUS SQL set operators for reverse ETL and warehouse loads.

CDC jobs

Transformations are applied as changes stream from transaction logs, enabling consistent modeling across inserts, updates, and deletes

Reverse ETL

Transformations create CRM- or ERP-ready fields before upserts, including external IDs, status indicators, and normalized attributes.

Business use cases by industry

Organizations across industries use Sync transformations to standardize diverse data sources, reduce downstream modeling work, and prepare analytics- and operations-ready outputs during ingestion.

Energy and utilities

Normalize Supervisory Control and Data Acquisition (SCADA) or operational logs into analytic structures
Enrich asset telemetry with equipment metadata
Downsample or filter high-frequency sensor data

Financial services

Standardize transaction formats across banking systems
Mask PII before data ingestion into Snowflake and Databricks
Calculate derived regulatory metrics during ingestion

Manufacturing

Build consistent production datasets across factories
Enrich machine logs with asset master data
Create feature sets that are ready for predictive maintenance

Retail/CPG

Normalize point-of-sale, loyalty, and product-catalog data
Join attribute tables to simplify merchandising analytics
Prepare marketing-ready insights for reverse ETL

Start Transforming Your Pipelines

Try Sync Free

CData is the data layer that makes AI work in production—live connectivity and replication across 350+ sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.