by Jerod Johnson | February 27, 2024

SSIS ETL: A Quick Guide on What It Is & How to Set It Up

SSIS ETL

It's no secret that today's data ecosystem is constantly growing and evolving. Data integration allows businesses to consolidate their information, enhancing analytics, improving data quality, and enabling real-time insights, thus supporting informed strategies, and optimizing customer experiences. This article helps organizations set up an Extract-Transform-Load (ETL) pipeline with SQL Server Integration Services (SSIS), a core component of Microsoft SQL Server designed for efficient data integration and transformation.

What is SSIS ETL?

SSIS, a key part of Microsoft SQL Server, is aimed at enabling organizations to create effective ETL processes. It's designed for data consolidation, cleansing, and migration across various sources. SSIS offers a full suite of tools for ETL workflow development, deployment, and management, highlighting data extraction and transformation, workflow control, scalability, data cleaning, flexibility, and broad connectivity. Tasks within SSIS are built and deployed as Integration Services packages, streamlining the data integration process.

SSIS supports a wide variety of data sources and destinations natively:

  • Flat files: SSIS can handle data stored in standard flat file formats, such as CSV, TXT, and XML, allowing for the integration of data from simple file-based storage systems.
  • Databases: It offers broad connectivity to relational databases like SQL Server, Oracle, DB2, and MySQL, enabling seamless data migration and synchronization between different database systems.
  • Web services: SSIS can connect to web services to extract or send data, facilitating integration with cloud-based applications and services, and expanding the horizons of data integration strategies to include online data sources.

CData's SSIS Data Flow Components and Tasks extend that connectivity to 300+ SaaS applications, big data stores, NoSQL databases, and more.

The 6 key features of SSIS ETL

SQL Server Integration Services provides a wide array of features to help build efficient data integration and transformation processes. Below is an overview of some of the key features offered by SSIS:

  1. Connection managers: Connection managers in SSIS are crucial for setting up and managing links to different data sources and destinations. They ensure secure and dependable data access and transfer, allowing SSIS packages to connect seamlessly with various supported data platforms, vital for integrating an organization's diverse data ecosystems.
  2. Built-in tasks, transformations, and functions: SSIS offers a wide range of built-in tasks and transformations for diverse data manipulation tasks like extraction, transformation, loading, and workflow management. It provides functions for handling text, dates, and numbers within data flows, enabling the creation of complex ETL processes without extensive custom coding.
  3. Fuzzy grouping transformations: Fuzzy grouping transformations in SSIS help identify duplicates in datasets by similarity, not just exact matches. This approach is key for data cleansing and standardization, effectively handling minor errors or variations to maintain data quality.
  4. Fuzzy lookup transformations: Fuzzy lookup transformations in SSIS enable comparison of incoming data with a reference table, facilitating the linking and enriching of datasets even when alignments aren't perfect. This flexibility enhances ETL flow accuracy and utility by allowing more lenient data matching.
  5. Data Profiling Task: The Data Profiling task in SSIS generates profiles from input data to spot potential issues needing resolution. It evaluates data quality attributes like column value distribution, candidate keys, and functional dependency, aiding in early identification of data quality problems during the ETL process.
  6. Data Profile Viewer: After executing the Data Profiling task in an SSIS data pipeline, organizations can utilize the Data Profile Viewer to examine analysis results, enabling them to effectively identify and address data quality issues.

How to set up an SSIS ETL package

Setting up an SSIS ETL package is straightforward and generally doesn't require any code.

As a note, you need to install a version of Visual Studio and be sure to add the SQL Server Integration Services Projects extension. To do so, open Visual Studio and click "Extensions" > "Manage Extensions" and search for "Integration Services" (see the screenshot below).

SSIS ETL

After installing the Integration Services extension, you can restart Visual Studio to get started or first download the CData SSIS Components for your data source to work with any data not natively supported in SSIS. The steps below will show how to move data from Salesforce to SQL Server, but the principles apply to any source or destination.

Step 1: Creating a new Project and Package

  1. In Visual Studio, click "File" > "New" > "Project"
  2. Select "Integration Services Project" and click "Next"
  3. Name the project, set the location where the project and associated files will be saved, and create a new solution (or add the project to an existing one) and click "Create"
    SSIS ETL

When you create the project, a package is automatically created to contain your ETL workflow.

Step 2: Configuring the Salesforce Connection Manager

  1. In the Connection Managers panel, right-click and select "New Connection" to add a new Connection Manager.
  2. Select the Connection Manager for your data source (CData Salesforce Connection Manager) and click "Add."
  3. Set the required connection properties for the data source. For Salesforce:
    • Set AuthScheme to Basic
    • Set User to your Salesforce username
    • Set Password to your Salesforce password
    • Set Security Token to the security token associated with your Salesforce account
      SSIS ETL
  4. Click "Test Connection" to ensure the connection is configured properly and click "OK."

Once you configure the data source connection, you are ready to configure the destination connection.

Step 3: Configuring the OLE DB Connection Manager (for SQL Server)

  1. In the "Connection Mangers" panel, right-click and click "New OLE DB Connection."
  2. In the "Configure OLE DB Connection Manager" wizard, click "New"
  3. In the Connection Manager, select the Server, Authentication type, and opt to select a database or attach a database file.
    SSIS ETL
  4. Click Test Connection to ensure you have configured the connection properly and click "OK."

With the source and destination connections configured, you are ready to create a data flow task.

Step 4: Configuring the Data Flow Task

  1. Open the Data Flow tab in the Integration Services Project and click to add a new Data Flow task.
  2. Drag a CData Salesforce Source component, a Lookup component, and an OLE DB Destination component onto the workspace.
    SSIS ETL

With the flow components adding to the Data Flow task, we are ready to configure the individual parts of the data flow.

Step 5: Configuring the CData Salesforce Source

The CData Salesforce Source component loads data from Salesforce by selecting an entity or using a pure Structured Query Language (SQL) query.

  1. In the Data Flow workspace, double-click the CData Salesforce Source.
  2. On the Connection Manager tab:
    • Set "Connection manager" to the CData Salesforce Connection Manager.
    • Set "Data access mode" to "Table Or View."
    • Set "Table Or View" to a Salesforce entity (e.g. Account)
      SSIS ETL
  3. Click "OK" to save the configuration.

Step 6: Configuring the Lookup Transformation

The Lookup Transformation checks to see if the information pulled from the source connection exists in the destination and lets you respond accordingly, typically adding new entries for new data and updating existing entries.

  1. In the Data Flow workspace, drag the output (blue arrow) from the CData Salesforce Source component to the Lookup component.
  2. Double-click the Lookup component.
  3. In the Lookup Transformation Editor, navigate to the Connection tab and select the OLE DB connection manager you previously configured and select a table or view to use for the lookup operation.
  4. On the Columns tab, map the identifying input Salesforce column to the lookup column for the SQL Server table.
    SSIS ETL
  5. Click "OK" to finalize the Lookup component.

Step 7: Configuring the OLD DB Destination

The OLE DB Destination lets you load data to your SQL Server database as a part of your SSIS ETL workflow.

  1. In the Data Flow task workspace, drag the output (blue arrow) from the Lookup component to the OLE DB Destination component.
  2. In the Input Output Selection, choose which output to use and click "OK." For this article, we choose "Lookup No Match Output," meaning the entry does not exist in our SQL Server database.
  3. Double-click the OLE DB Destination component.
  4. In the OLE DB Destination Editor, select the OLE DB connection manager you previously configured and select a table or view to use for the lookup operation.
  5. On the Mappings tab, map the Salesforce input columns to the SQL Server destination columns.
    SSIS ETL
  6. Click "OK" to save the configuration.

After following these steps, you have built an ETL workflow to pull data from Salesforce and insert it into a SQL Server database, using a Lookup transformation to handle existing Salesforce entries in the database. And you've built this data pipeline without writing a single line of code, showcasing the power and utility of SSIS.

CData lets you transform data the way you want

With CData's suite of SSIS tasks and components, you can quickly build ETL pipelines that work with 300+ different data sources and destinations. Between the built-in transformations available in SSIS and the tasks and components included when you install CData's SSIS solutions, organizations can transform exactly the data they want, exactly how they want. And because CData's solutions are fully-compatible with Azure Data Factory, you can deploy your ETL workflows to the cloud.

Download a free, 30-day trial of any CData SSIS task and component to get started today.