CData Sync

High-Volume Replication That Stays Accurate and Stable at Enterprise Scale

Sync maintains consistent throughput for wide tables, large schemas, and continuous Change Data Capture (CDC) streams through a staging architecture and SQL-driven change detection that is designed to protect sources and keep downstream systems current.

See Sync in Action

Try Sync Free

Handling high-volume data in Sync

High-volume pipelines require consistent throughput, source protection, and accurate CDC across large tables and schemas. Sync maintains stability under load by combining log-based CDC, controlled staging, and SQL-driven delta processing that keeps destinations current without stressing operational systems.

Protect source systems during large and continuous ingestion

Log-based streaming with minimal source impact
Automatic resumption from log positions (log sequence number [LSN], system change number [SCN], and offset)

Maintain consistent throughput across wide tables and large schemas

CDC staging engine for high-volume sources including Oracle, SQL Server, PostgreSQL, and DB2
Delta Snapshot concurrency with efficient change detection using SQL set operators EXCEPT and MINUS

Ensure CDC accuracy during initial loads and continuous changes

Unified processing for historical backfill and ongoing CDC
Delta logic for the clean application of incremental changes during table buildout

Control downstream processing and reduce warehouse compute

Staging architecture for efficient processing of large batches before warehouse or lake loads
SQL-driven diffing to reduce unnecessary merges and downstream compute demand

High-volume architecture

Stream changes efficiently with a CDC engine

These capabilities support consistent CDC throughput under heavy load and prevent the bottlenecks common in polling-heavy or single-threaded ingestion models.

Apply SQL expressions during replication:

Support streaming for binary, redo, write-ahead, and journal logs
Multiple gigabytes (multi-GB) staging with controlled file sizes (stage.file.max.rows, stagemaxsize)
Automatic pausing and resumption for continuous CDC ingestion
Continuous replay into cloud warehouses or lakes

Maintain stability during large loads with controlled staging

Staging allows Sync to buffer data in predictable batches, which reduces strain on both the source and the destination. This controlled approach creates a stable path for high-volume ingestion without overwhelming cloud resources.

Multi-GB batch handling without exceeding cluster limits
Controlled file sizes that prevent oversized merges or long-running load operations
Safe checkpointing and recovery mechanisms

Accelerate throughput with parallel task execution

Sync increases throughput by splitting large schemas across parallel tasks. Each task runs independently with its own checkpoints, which prevents serial bottlenecks and supports large enterprise schema footprints.

Independent checkpoints for each parallel task
Parallelization across large schemas for faster overall execution

Optimize performance for extremely wide tables

Wide tables commonly degrade ingestion performance in other ETL tools because of excessive diffing or normalization. Sync's wide-table optimizations maintain speed and accuracy even when tables exceed hundreds of columns.

Efficient diffing of row state
Intelligent column-level change tracking, where it's supported
Optional column pruning through transformations to reduce early-stage footprint

Enterprise example

High-volume Oracle migration for a Fortune 500 energy company

A Fortune 500 energy enterprise needed to migrate operational tables with more than 700 columns from Oracle to Databricks and keep them current by using continuous CDC. The company required a large historical backfill and ongoing change capture under strict SLAs. However, its previous ingestion platform could not complete the migration because of wide-table limits, performance issues, and CDC drift at scale.

How Sync helped

Oracle CDC supported both the initial backfill and continuous log-based streaming
Staged ingestion handled multi-GB batches without overloading Databricks clusters
Unified full-load and incremental processing kept downstream tables accurate throughout the migration
Transformations reduced unnecessary columns early to improve performance

Industry Use Cases

Sync supports high-volume replication needs across industries where wide tables, continuous CDC, and large historical datasets are common.

Energy and utilities

Replicate wide operational tables from Oracle and DB2
Stream Supervisory Control And Data Acquisition (SCADA) and telemetry updates with low source impact

Telecom

Process millions of daily call events with consistent CDC throughput
Maintain intraday freshness across very large event tables

Manufacturing

Move machine and Manufacturing Execution System (MES) data with extremely wide schemas
Run continuous CDC from on-premises databases without disrupting plant systems

Finance

Replicate large transaction tables with multiple-terabytes (multi-TB) history
Combine full-load and incremental processes for intraday reporting

See how Sync handles your largest workloads

Try Sync Free

CData is the data layer that makes AI work in production—live connectivity and replication across 350+ sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.