Join us as we speak with the product teams about the next generation of cloud data connectivity.
Configuring Incremental Replication in CData Sync
This entry covers how to set up incremental replication for Data Sync.
Date Entered: 2/25/2020 Last Updated: 9/9/2021
When using REPLICATE queries, Sync will try as much as possible to retrieve only records that have been recently changed, rather than the entire table. For many of our sources, this is preconfigured; you can check this by clicking on your query to check the Task Settings:
Sync's incremental behavior relies on the source table having column that represents when a row was last updated, typically a datetime or integer. Certain connectors, such as generic database sources, will need to manually specify the column. This can be specified at the job or table level using the IncrementalCheckColumns job option:
Option 1: Job Level
If your tables use the same column name for the check column you want to use, you can set the check column once at the job level, rather than for each table. You can specify multiple columns in a comma-separated list and Sync will use the first available column as the check column. Add the following to the Additional Options filed in the advanced job settings:
Option 2: Table Level
YOu can specify the check column inside of the query using syntax like the following:
REPLICATE [table] WITH IncrementalCheckColumns = 'updated'
Alternatively, if you click on your query to bring up the Task Settings, for many generic databases there will be a dropdown next to Incremental Check Column which lets you select the column via the UI. Selecting a column will automatically write the query (above) for you.
A good IncrementalCheckColumns for a Source table satisfies three conditions:
- When a row is added to the Source table, this column's value is set to current datetime.
- When a row is modified in the Source table, this column's value is updated to current datetime.
- When querying the Source table, this column is a valid filter to be passed to the Source.
SQL Server Source Example
By default, all rows will be retrieved on every job execution:
Set the IncrementalCheckColumns property:
Update a couple records in source table:
UPDATE IncrementalTable SET job_title='Software Engineer', modified=CURRENT_TIMESTAMP WHERE id=0 UPDATE IncrementalTable SET job_title='CEO', modified=CURRENT_TIMESTAMP WHERE id=5
Now in future executions, only records updated after the previous job run will be retrieved:
We appreciate your feedback. If you have any questions, comments, or suggestions about this entry, please contact our support team at firstname.lastname@example.org.