CData Sync

High-Volume Replication That Stays Accurate and Stable at Enterprise Scale

Sync maintains consistent throughput for wide tables, large schemas, and continuous Change Data Capture (CDC) streams through a staging architecture and SQL-driven change detection that is designed to protect sources and keep downstream systems current.

Handling high-volume data in Sync

High-volume pipelines require consistent throughput, source protection, and accurate CDC across large tables and schemas. Sync maintains stability under load by combining log-based CDC, controlled staging, and SQL-driven delta processing that keeps destinations current without stressing operational systems.



Protect source systems during large and continuous ingestion

  • Log-based streaming with minimal source impact
  • Automatic resumption from log positions (log sequence number [LSN], system change number [SCN], and offset)

Maintain consistent throughput across wide tables and large schemas

  • CDC staging engine for high-volume sources including Oracle, SQL Server, PostgreSQL, and DB2
  • Delta Snapshot concurrency with efficient change detection using SQL set operators EXCEPT and MINUS

Ensure CDC accuracy during initial loads and continuous changes

  • Unified processing for historical backfill and ongoing CDC
  • Delta logic for the clean application of incremental changes during table buildout

Control downstream processing and reduce warehouse compute

  • Staging architecture for efficient processing of large batches before warehouse or lake loads
  • SQL-driven diffing to reduce unnecessary merges and downstream compute demand

High-volume architecture

Stream changes efficiently with a CDC engine

These capabilities support consistent CDC throughput under heavy load and prevent the bottlenecks common in polling-heavy or single-threaded ingestion models.

Apply SQL expressions during replication:

  • Support streaming for binary, redo, write-ahead, and journal logs
  • Multiple gigabytes (multi-GB) staging with controlled file sizes (stage.file.max.rows, stagemaxsize)
  • Automatic pausing and resumption for continuous CDC ingestion
  • Continuous replay into cloud warehouses or lakes

Maintain stability during large loads with controlled staging

Staging allows Sync to buffer data in predictable batches, which reduces strain on both the source and the destination. This controlled approach creates a stable path for high-volume ingestion without overwhelming cloud resources.

  • Multi-GB batch handling without exceeding cluster limits
  • Controlled file sizes that prevent oversized merges or long-running load operations
  • Safe checkpointing and recovery mechanisms

Accelerate throughput with parallel task execution

Sync increases throughput by splitting large schemas across parallel tasks. Each task runs independently with its own checkpoints, which prevents serial bottlenecks and supports large enterprise schema footprints.

  • Independent checkpoints for each parallel task
  • Parallelization across large schemas for faster overall execution

Optimize performance for extremely wide tables

Wide tables commonly degrade ingestion performance in other ETL tools because of excessive diffing or normalization. Sync's wide-table optimizations maintain speed and accuracy even when tables exceed hundreds of columns.

  • Efficient diffing of row state
  • Intelligent column-level change tracking, where it's supported
  • Optional column pruning through transformations to reduce early-stage footprint
Enterprise example

High-volume Oracle migration for a Fortune 500 energy company

A Fortune 500 energy enterprise needed to migrate operational tables with more than 700 columns from Oracle to Databricks and keep them current by using continuous CDC. The company required a large historical backfill and ongoing change capture under strict SLAs. However, its previous ingestion platform could not complete the migration because of wide-table limits, performance issues, and CDC drift at scale.

How Sync helped

  • Oracle CDC supported both the initial backfill and continuous log-based streaming
  • Staged ingestion handled multi-GB batches without overloading Databricks clusters
  • Unified full-load and incremental processing kept downstream tables accurate throughout the migration
  • Transformations reduced unnecessary columns early to improve performance

Industry Use Cases

Sync supports high-volume replication needs across industries where wide tables, continuous CDC, and large historical datasets are common.

icon

Energy and utilities

  • Replicate wide operational tables from Oracle and DB2
  • Stream Supervisory Control And Data Acquisition (SCADA) and telemetry updates with low source impact
icon

Telecom

  • Process millions of daily call events with consistent CDC throughput
  • Maintain intraday freshness across very large event tables
icon

Manufacturing

  • Move machine and Manufacturing Execution System (MES) data with extremely wide schemas
  • Run continuous CDC from on-premises databases without disrupting plant systems
icon

Finance

  • Replicate large transaction tables with multiple-terabytes (multi-TB) history
  • Combine full-load and incremental processes for intraday reporting

See how Sync handles your largest workloads