A Short Guide on Data Replication Types and Use Cases: Incremental and Upsert Replication

by CData Software | October 4, 2022

CData Sync

CData Sync is a powerful ETL/ELT pipeline tool that gives data teams the flexibility to always choose the right replication strategy for their specific requirements. It supports high-performance data integration with efficient incremental replication and granular control over how data is inserted and updated at the destination.

What is incremental replication?

Incremental replication is a lightweight replication strategy that keeps the footprint at the data source low, transfers only a small volume of information, and makes few inserts and updates at the destination.

Once you complete a full CopyOver replication, incremental replication becomes the more efficient path forward. It copies only newly added and updated data from the source table to the destination, focusing exclusively on what changes. This makes it less resource-intensive and faster than other replication types.

Use cases for incremental replication in CData Sync

CData Sync identifies which records are new or have changed since the last job run and replicates only those. The following scenarios are where this method works best.

The source table receives only new entries and updates, with no deletes (for example, a table tracking customer purchases).
The source table is large, but only a small number of rows change between runs.
You need to archive all records, even when the source deletes them.

Challenges of incremental replication

Incremental replication requires a reliable column in the source table that reflects when a row was last added or updated, such as a LastModified datetime column or an auto-incrementing integer ID. CData Sync preconfigures this automatically for many connectors. For others, such as generic database sources, you can specify the incremental check column manually in the task settings.

Without an incremental check column configured, Sync retrieves all rows on every job execution. Choose a column that accurately reflects new and changed records to avoid missing rows, duplicate rows, or stale data in the destination.

Since incremental replication does not build a record of each update, you have no previous stages to fall back to. Each time the job runs, Sync scans the target table, stores the maximum value of the incremental check column, and uses that as the starting point for the next run.

How incremental replication works in CData Sync

CData Sync identifies new and modified records using two methods: an incremental check column and Change Data Capture (CDC).

The incremental check column is either a datetime or integer-based column that Sync uses to detect new or modified records during each replication run. Each time someone adds or updates a record in the source, the value in this column increases. Sync uses it as a filter during extraction to pull only records that changed since the last run, then stores the new maximum value for the next job.

For sources that support it, Sync can also use CDC, reading directly from the source's log files to detect inserts, updates, and deletes without querying the full table.

During each job run, Sync retrieves the changed data and merges it into the destination table, inserting new records and updating existing ones based on the primary key.

What is upsert replication?

Upsert replication UPDATEs and INSERTs new data in the target table simultaneously, like a SQL MERGE operation. You can select multiple columns to update and identify new data using an SQL expression, typically referencing a modification timestamp.

Use upsert replication when you need to update and insert data in a single pass. Unlike batch replication, it lets you programmatically define which records to update. For example, if you have a products table and want to update prices only for products costing less than $150, upsert replication handles that with a single logic statement. Batch replication would force you to choose between deleting and updating, never both at the same time.

Upsert replication in CData Sync

CData Sync uses upsert replication to replicate a source table directly into a destination table. Sync determines whether to INSERT or UPDATE each record based on the destination table's primary key columns. When a record with a matching primary key already exists, Sync updates it. When no match exists, Sync inserts it as a new row.

Unlike history mode, upsert replication keeps only one entry per unique record, adding no extra timestamp columns or versioned rows to the destination table.

You can customize which columns Sync updates and apply in-flight transformations using custom REPLICATE queries or the SQL query builder in the task configuration.

Get started with CData Sync

CData Sync is trusted by businesses around the world to build reliable, scalable data pipelines across 200+ sources and destinations. Whether you need lightweight incremental updates or precise upsert logic, CData Sync gives you the tools to keep your destination systems accurate and always up to date.

Start a 30-day free trial today to see how CData Sync can streamline your data replication strategy.

Try CData Sync free

Download your free 30-day trial to see how CData Sync delivers seamless integration

Get The Trial

CData is the data layer that makes AI work in production—live connectivity and replication across hundreds of the most critical enterprise sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Blog