by Kim Kaluba | December 15, 2021

Support for Change Data Capture-based Data Replication

Change Data Capture (CDC) is a proven data integration pattern to track when and which changes occur in data, then alert other systems & services that must respond to those changes. Change data capture helps maintain consistency and functionality across all systems that rely on data.

Many database systems have CDC capabilities that address their specific data environment. This is good for the database but holds little value to the systems outside of that database. Given the complexity of the modern data landscape, how do organizations understand and effectively apply CDC practices across the entire data environment to ensure that the most up to date information is being used to drive business decisions?

CDC Methods and Challenges

According to Dataversity, there are four common methods to perform CDC and each one has its own challenges.

1. Date Modified

In transactional applications, metadata is captured on each row of data, including who created the data and when it was created/modified. CDC keeps track of columns or rows that were altered since the data was last extracted. This method comes with two drawbacks. One, when data is deleted, the records associated with that data are also deleted. Data modified must live on all tables, which increases overhead on the transactional application. Second, extracting just the changed data uses a lot of resources, impacting application performance.

2. Diff

The diff method compares the current state of data with the last state of data to identify the changes. Diff can identify deleted data, solving a major problem with 'Date Modified,' but this approach requires a large compute environment to identify the changes between the two data states and is not ideal for understanding real-time data changes.

3. Log-Based

Since transactional databases store all changes in a transactional log for recovery purposes, log-based CDC leverages this feature to keep up with the changes of the system. Understanding the transactional log can be difficult because there are no standards on how that data is logged or stored, some database vendors do not provide an interface into the logs, or if an interface is present, it may be slow to query/monitor and resource intensive. Finally, most database systems use internal identifiers for database recovery and have not been designed to support CDC requirements thus causing supplemental logging of primary key columns to be put in place to overcome this limitation.

4. Triggers

Database triggers are done in shadow tables that store the entire data row or primary keys and keeps track of every single column change associated with that row of data. Using the trigger approach may lower overhead to extract the changes, but it will increase the overhead to record the changes. It is important to note that a trigger will not record a change if the source application data is truncated. If changes are made to tables, then triggers and shadow tables may have to be modified, recreated, or recompiled, which adds extra overhead to the CDC process.

Automated Change Data Capture Replication

CData Sync leverages CDC to track every change applied to a table and records those changes via a shadow history table. However, CData has engineered efficient solutions that solve many of the challenges with trigger-based CDC.

First, the CData Sync change data capture feature does not only capture the primary key, CData Sync CDC records the full row of data to the history table, allowing CDC to work with tables that do not include the primary keys. This provides customers with more data capabilities and allows them to keep track of data changes beyond primary keys.

Second, CData Sync selects from the history view to gather the changes, instead of the source table. As a result, CDC has less impact on the performance of source tables because the tool does not interact directly with the source table for incremental replication to other important data sources or applications.

With CData Sync your data is always current across your data landscape so you can make decisions using the most current data available - with no impact to source or target systems.

Try CData Sync Free for 30 Days

To incorporate high-performance Change Data Capture into your ETL/ELT data pipeline and data warehousing initiatives, sign up today for a free trial of CData Sync. Available on-prem, in the cloud, even on AWS or Azure, Sync makes it easy to get started with high-volume data replication and CDC.

Get Free Trial