Slowly Changing Dimensions (SCD): A No-Code Approach



Slowly Changing Dimensions (SCDs) are essential in data warehousing to manage and track changes in dimension data over time. There are predominantly 8 types of SCDs, each handling changes differently:

  1. SCD Type 0 (Fixed): No changes are allowed. The data remains as it was initially loaded.
  2. SCD Type 1 (Overwrite): Changes overwrite the existing data, and no history is maintained.
  3. SCD Type 2 (Add New Row): A new row is added with a new version of the data, preserving the history of changes.
  4. SCD Type 3 (Add New Attribute): A new column is added to store the previous value, allowing for limited history tracking.
  5. SCD Type 4 (Add Historical Table): A separate historical table is created to track changes.
  6. SCD Type 5 (Mini-Dimension and Type 1 Outrigger): Combines a mini-dimension with a Type 1 attribute to track current values.
  7. SCD Type 6 (Hybrid): Combines SCD Types 1, 2, and 3 to provide a comprehensive approach to tracking changes.
  8. SCD Type 7 (Hybrid): Similar to Type 6 but includes additional features for more complex scenarios.

With CData Sync, a no-code data infrastructure tool, users can easily implement these SCD models. This article introduces the most common types—Types 1 to 3—and explains how CData Sync enables you to configure and automate the management of these SCD types without writing any code, making it accessible to users of all technical levels.

What is a Slowly Changing Dimension (SCD)?

An SCD is a data modeling technique used to classify and manage changes in data stored within a data warehouse—the core of a data analysis platform. It focuses on how historical changes in the attributes of master data, which serve as key analytical references, are tracked and stored.

There are eight types of SCDs, ranging from Type 0 to Type 7. However, as mentioned earlier, this article will introduce the three most common types: Types 1, 2, and 3.

Understanding SCD types 1 to 3 with an example

Let’s explore these types using an example of a customer’s billing address change. To simplify the model, we have excluded surrogate keys and focused only on city names.

Suppose a business, "CData Solutions," initially based in Chicago, IL, relocates to New York City, NY. As a result, the billing address was updated to New York City on December 3, 2022.

This example will help illustrate how each SCD type handles changes in master data attributes.

SCD Type 1: Overwrite (Store Only the Latest Value)

SCD Type 1 is the simplest approach to handling data changes. It overwrites the old value with the new one, meaning the previous address is permanently lost.

Before Update (Old Address - Chicago, IL)

IdNameBillingCityLastModifiedDate
101CData SolutionsChicago, IL2021-08-15

After Update (New Address - New York City, NY)

IdNameBillingCityLastModifiedDate
101CData SolutionsNew York City, NY2022-12-03

In this case, the billing address (BillingCity) will be stored as New York City, NY after the relocation.

Therefore, since past addresses are not retained in the master data that serves as the axis of analysis if the previous address (Chicago, IL) is not stored in transaction data such as invoices, it will be impossible to retrieve or analyze data related to that address at a specific point in time in the past. However, it is common for transaction data (such as invoices) to capture the billing address at the time of issue, allowing businesses to reference historical locations for financial and operational analysis.

SCD Type 2: Add a New Record (Track Full History)

SCD Type 2 is used when historical tracking is required. Instead of overwriting the old value, a new record is inserted for the updated address while keeping the previous address.

This approach allows businesses to track address changes over time.

Before Update

IdNameBillingCityStartDateEndDateIsCurrent
101CData SolutionsChicago, IL2021-08-15NULLYes

After Update

IdNameBillingCityStartDateEndDateIsCurrent
101CData SolutionsChicago, IL2021-08-152022-12-02No
101CData SolutionsNew York City, NY2022-12-03NULLYes

In the example below, CData Solutions originally had its billing address in Chicago, IL, but after relocating to New York City, NY on December 3, 2022, a new record was added while keeping the old address intact. This allows for tracking historical changes in customer data. Since most business systems that feed data into a data warehouse are OLTP-based and typically retain only the latest data, the SCD Type 2 approach ensures historical records are preserved, which is a key objective of a data warehouse.

By maintaining fields like StartDate (the date the record was created), EndDate (the date another new record was created), and IsCurrent (indicating if the record is the latest one), businesses can retrieve historical snapshots of customer data at any given point in time. This enables various analytical use cases, such as obtaining a cross-sectional dataset of customers on a specific date or filtering to display only the most recent records.

SCD Type 3: Add a New Field (Store Limited History)

SCD Type 3 stores a limited history by adding extra columns to track the previous and current values in the same record. Unlike SCD Type 2, which maintains unlimited history, SCD Type 3 is designed to keep only the most recent changes.

Before Update

IdNameCurrentBillingCityPreviousBillingCityLastModifiedDate
101CData SolutionsChicago, ILNULL2021-08-15

After Update

IdNameCurrentBillingCityPreviousBillingCityLastModifiedDate
101CData SolutionsNew York City, NYChicago, IL2022-12-03

In this method, the original record with the old address (Chicago, IL) remains in the database with an EndDate marking when the change occurred, while a new record with the updated address (New York City, NY) is created with a new StartDate.

This allows businesses to track past and present addresses while maintaining a clear timeline of changes. Since each change results in a new entry, multiple historical address changes can be stored, making it possible to analyze how customer locations evolve over time. However, this approach requires more storage space and additional processing to manage multiple records for the same entity.

Implementing SCD with CData Sync

CData Sync enables the creation of a data analysis infrastructure by loading data into a data warehouse (ELT) and then transforming it within the warehouse. This approach ensures efficient data processing and storage while leveraging the capabilities of modern data platforms.

SCD in CData Sync

Data modeling using SCD is typically performed during data transformation. CData Sync provides an integrated data transformation feature called dbt, which functions as the T (Transform) component in ELT processes. This allows users to build a modern data modeling environment within the data warehouse.

CData Sync New Features – dbt Core Integration

dbt includes a feature called Snapshots, which enables the implementation of SCD Type 2 by tracking changes in data over time. Organizations can leverage this function to maintain historical records efficiently.

ETL vs. ELT: Traditional and Modern Approaches

In addition to ELT, CData Sync also supports the traditional ETL (Extract, Transform, Load) method, where data is transformed before being loaded into the data warehouse. With the ETL approach, CData Sync allows users to implement SCD Type 1 and Type 2 without writing any code.

SCD Type 1: Overwriting Data (Default Behavior in CData Sync)

By default, CData Sync follows the SCD Type 1 approach, where only the latest values from the source system are stored in the data warehouse.

To optimize data updates, CData Sync includes an advanced differential update mechanism that uses Change Data Capture (CDC) for major relational databases (RDBMS). This ensures that only the modified records are synchronized, improving efficiency and reducing processing time. Even for SaaS applications that do not support incremental updates, CData Sync provides alternative methods to keep data up to date.

SCD Type 2: Maintaining Historical Data with CData Sync

A major enhancement in CData Sync V22 is History Mode, which allows users to implement SCD Type 2 and retain historical data changes. This feature enables organizations to track changes over time and maintain a complete historical record of data updates.

For businesses that require regular snapshots of data (e.g., monthly or quarterly), CData Sync allows job variables to be set, enabling automated historical data tracking.

SCD Type 3: Not Supported in the ETL Method

Unlike SCD Type 1 and Type 2, SCD Type 3 cannot be directly implemented using the ETL method in CData Sync. Instead, organizations must first import data using SCD Type 2 and then create a separate transformation table within the data warehouse using the ELT approach.

Conclusion

In this article, we explored how to implement different types of Slowly Changing Dimensions (SCDs) using CData Sync, a no-code data infrastructure tool. We demonstrated how to integrate dbt, a modern data modeling framework, within CData Sync and explained how to utilize its key features, including Change Data Capture (CDC), history mode, and job variable settings. By leveraging these functions, you can efficiently manage data history in a flexible and scalable manner, tailored to various use cases and design policies.

CData Sync offers a 30-day free trial, allowing you to experience its capabilities firsthand. Start consolidating your enterprise data to a cloud data warehouse today! As always, our world-class Support Team is ready to answer any questions you may have.