Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →Use Biml to Build SSIS Tasks to Replicate IBM Cloud Object Storage Data to SQL Server
Use Biml to dynamically build SSIS tasks using CData SSIS Components to replicate IBM Cloud Object Storage data to SQL Server.
Using SQL Server as a backup for critical business data provides an essential safety net against loss and enables business users to more easily connect that data with features like reporting, analytics, and more. Biml is an XML dialect that can be used to create Microsoft SQL Server BI objects, like SSIS packages. Pairing CData SSIS Components with Biml empowers you to easily build SSIS packages with access to IBM Cloud Object Storage data. Key benefits include:
- Built-in metadata discovery — CData SSIS components expose metadata just like working with SQL Server, even dynamically generating schema for schema-less data sources
- Dynamic SSIS task generation — Use code nuggets in Biml to build SSIS tasks by iterating over discovered metadata
- Read from IBM Cloud Object Storage — Native source components make IBM Cloud Object Storage look just like a database
This article demonstrates how to use Biml with the CData SSIS Components for IBM Cloud Object Storage to dynamically build SSIS tasks (one for each IBM Cloud Object Storage entity) to replicate IBM Cloud Object Storage data to a Microsoft SQL Server database. We step through the Biml file one section at a time but have included the complete Biml file at the end of the article.
In order to use Biml in an SSIS Project in Visual Studio, install BimlExpress. Once you install BimlExpress, open Visual Studio, create a new Integration Services project, and add a new Biml file.
Building the Biml File
With Biml, you can write scripting to dynamically generate SSIS projects, packages, and tasks. To see the Biml file for an existing project (and gain insights on using Biml with CData SSIS Tasks), simply create your tasks and then right-click the project and select Convert SSIS Packages to Biml.
C# Code
- Use directives <#@ .. #> to import necessary namespaces and the assembly for the CData SSIS Components for IBM Cloud Object Storage.
<#@ template language="C#" hostspecific="true"#> <#@ import namespace="System.Data"#> <#@ import namespace="System.IO"#> <#@ import namespace="System.Collections"#> <#@ import namespace="System.Data.CData.IBMCloudObjectStorage"#> <#@ assembly name="C:\Program Files\CData[product_name] 2018\lib\CData.SSIS2017.IBMCloudObjectStorage.dll"#>
In a new control nugget <# ... #>, create variables for values that will be used throughout the Biml script, including a connection string for IBM Cloud Object Storage and structures to store the IBM Cloud Object Storage metadata.
Register a New Instance of Cloud Object Storage
If you do not already have Cloud Object Storage in your IBM Cloud account, follow the procedure below to install an instance of SQL Query in your account:
- Log in to your IBM Cloud account.
- Navigate to the page, choose a name for your instance and click Create. You will be redirected to the instance of Cloud Object Storage you just created.
Connecting using OAuth Authentication
There are certain connection properties you need to set before you can connect. You can obtain these as follows:
API Key
To connect with IBM Cloud Object Storage, you need an API Key. You can obtain this as follows:
- Log in to your IBM Cloud account.
- Navigate to the Platform API Keys page.
- On the middle-right corner click "Create an IBM Cloud API Key" to create a new API Key.
- In the pop-up window, specify the API Key name and click "Create". Note the API Key as you can never access it again from the dashboard.
Cloud Object Storage CRN
If you have multiple accounts, you will need to specify the CloudObjectStorageCRN explicitly. To find the appropriate value, you can:
- Query the Services view. This will list your IBM Cloud Object Storage instances along with the CRN for each.
- Locate the CRN directly in IBM Cloud. To do so, navigate to your IBM Cloud Dashboard. In the Resource List, Under Storage, select your Cloud Object Storage resource to get its CRN.
Connecting to Data
You can now set the following to connect to data:
- InitiateOAuth: Set this to GETANDREFRESH. You can use InitiateOAuth to avoid repeating the OAuth exchange and manually setting the OAuthAccessToken.
- ApiKey: Set this to your API key which was noted during setup.
- CloudObjectStorageCRN (Optional): Set this to the cloud object storage CRN you want to work with. While the connector attempts to retrieve this automatically, specifying this explicitly is recommended if you have more than Cloud Object Storage account.
When you connect, the connector completes the OAuth process.
- Extracts the access token and authenticates requests.
- Saves OAuth values in OAuthSettingsLocation to be persisted across connections.
var ibmcloudobjectstorageConnectionString = "ApiKey=myApiKey;CloudObjectStorageCRN=MyInstanceCRN;Region=myRegion;OAuthClientId=MyOAuthClientId;OAuthClientSecret=myOAuthClientSecret;"; var replicationServer = "SERVER"; var replicationCatalog = "CATALOG"; var replicationUserID = "sqluser"; var replicationPassword = "sqlpassword"; List<string> allEntityNames = new List<string>(); Hashtable entitySchema = new Hashtable();
- In the same control nugget used to defined variables, use ADO.NET code to programmatically query the IBM Cloud Object Storage entities (tables) and fields (columns).
using (IBMCloudObjectStorageConnection connection = new IBMCloudObjectStorageConnection(ibmcloudobjectstorageConnectionString)) { connection.Open(); var entities = connection.GetSchema("Tables").Rows; foreach (DataRow entity in entities) { allEntityNames.Add(entity["TABLE_NAME"].ToString()); } foreach (string entity in allEntityNames){ var columns = connection.GetSchema("Columns", new string [] {entity}).Rows; entitySchema.Add(entity,columns); } }
Class Nugget
In our Biml script to create the replication tasks, there are several places where repeated XML elements are created dynamically (mostly for columns in SSIS tasks). Instead of repeating the code, add a class nugget <#+ ... #> and create a helper class with methods to consolidate repeated code (full code at the end of the article).
- Add public static variables to determine which type of XML element to create.
public static int OUTPUT_WITH_ERROR = 0; public static int EXTERNAL = 1; public static int OUTPUT = 2; public static int DATAOVERRIDE_COLUMN = 4;
- Add a public method to build a SQL statement for use in the ExecuteSQL task used to drop existing tables and create a new table for the replicated data.
// Dynamically builds a DROP TABLE and CREATE statement // for each entity (table) in IBM Cloud Object Storage using the table name and metadata. public static string GetDeleteAndCreateStatement(string tableName, DataRowCollection columns) { ... }
- Add a public method to build the collection of column-based XML elements.
// Dynamically build various column-based XML elements // for each entity (table) in IBM Cloud Object Storage based on the column // metadata and the parent element public static string GetColumnDefs(DataRowCollection columns, int columnType){ ... }
Biml Script
Now that you have the table metadata and a Helper class to reduce repeated code, write the Biml script to dynamically create your replication packages.
- Start by adding a CustomSsisConnection element for the CData SSIS Tasks. Note that the ObjectData attribute must be XML encoded. A typical connecting string looks similar to the following (note the use of the ibmcloudobjectstorageConnectionString variable for the ConnectionString property:
<IBMCloudObjectStorageConnectionManager> <Property Name="ConnectionString"><#=ibmcloudobjectstorageConnectionString#></Property> </IBMCloudObjectStorageConnectionManager>
After configuring the connection to the CData SSIS Task, configure a connection to the replication database. The completed Connections element looks like the following (note the use of text nuggets <#= ... #> to add variables for connection string values):
<Connections> <CustomSsisConnection Name="CData IBMCloudObjectStorage Connection Manager" CreationName = "CDATA_IBMCLOUDOBJECTSTORAGE" ObjectData = "<IBMCloudObjectStorageConnectionManager> <Property Name="ConnectionString"> <#=ibmcloudobjectstorageConnectionString#></Property> </IBMCloudObjectStorageConnectionManager>" /> <Connection Name="Destination" ConnectionString="Data Source=<#=replicationServer#>;User ID=<#=replicationUserID#>;Password=<#=replicationPassword#>;Initial Catalog=<#=replicationCatalog#>;Provider=SQLNCLI11.1;"/> </Connections>
With the Connections element configured, you are ready to build our replication package. In the package, the Biml script create an ExecuteSQL task and a Dataflow task for each table to be replicated.
To build each set of tasks, use a while loop in a control nugget to iterate through the entity (table) names:
int entityCounter = 0; while(entityCounter < allEntityNames.Count){ var tableName = allEntityNames[entityCounter].ToString(); DataRowCollection columns = ((DataRowCollection)entitySchema[tableName]);
- ExecuteSQL Task
In the ExecuteSQL task, execute a SQL query to drop any existing tables that have the same name as our IBM Cloud Object Storage entity (table) and create a new table based on the metadata discovered using the CData SSIS Component.
To create the query dynamically, use the Helper.GetDeleteAndCreateStatement() helper function.
- Dataflow Task
Within the Dataflow use a CustomComponent as the source component and an OleDbDestination as the destination.
- CustomComponent Element
The CustomComponent element uses the CData SSIS Source component to retrieve IBM Cloud Object Storage data. Start by configuring the component to use with the CData component.
<CustomComponent Name="CData IBM Cloud Object Storage Source" ComponentTypeName="CData.SSIS.IBMCloudObjectStorage.IBMCloudObjectStorageSource" Version="18" ContactInfo="[email protected]" UsesDispositions="true"> ... </CustomComponent>
DataflowOverrides and OutputPaths Elements
The next step after configuring the connection is to add Columns elements to the OutputPath child element of the DataflowOverrides element. To do so, call the Helper.GetColumnDefs() helper function.
Use the same Helper class to add columns to the OutputColumns and ExternalColumns child elements of the various OutputPaths elements.
The definitions created provide information about the input, output, and error information for the SSIS component.
<DataflowOverrides> <OutputPath OutputPathName="CData IBMCloudObjectStorage Source Output"> <Columns> <#=HelperClass.GetColumnDefs(columns,HelperClass.DATAOVERRIDE_COLUMN) #> </Columns> </OutputPath> </DataflowOverrides> ... <OutputPaths> <OutputPath Name="CData IBMCloudObjectStorage Source Output"> <OutputColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.OUTPUT_WITH_ERROR) #> </OutputColumns> <ExternalColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.EXTERNAL) #> </ExternalColumns> </OutputPath> <OutputPath Name="CData IBMCloudObjectStorage Source Error Output" IsErrorOutput="true"> <OutputColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.OUTPUT) # </OutputColumns> </OutputPath> </OutputPaths>
CustomProperties Element
The CData SSIS tasks are surfaced in SSIS as custom components with a series of required CustomProperties:
<CustomProperties> <CustomProperty Name="SQLStatement" DataType="Null" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version= 10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" SupportsExpression="true"></CustomProperty> <CustomProperty Name="AccessMode" DataType="Int32" TypeConverter="CData.SSIS.IBMCloudObjectStorage.AccessModeToStringConverter">0</CustomProperty> <CustomProperty Name="TableOrView" DataType="String" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version= 10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" SupportsExpression="true">[<#=tableName#>]</CustomProperty> <CustomProperty Name="ExecStoredProcedure" DataType="Boolean">false</CustomProperty> </CustomProperties>
Connections Element
The last element to add to the CustomComponent element is a Connections element, attaching the previously defined connection to the task:
<Connections> <Connection Name="IBMCloudObjectStorage 2018 Connection" ConnectionName="CData IBMCloudObjectStorage Connection Manager" /> </Connections>
- OleDbDestination Element
The final piece of the Dataflow task is the OleDbDestination element. Attach the previously defined OleDbConnection to the element, set the InputPath and ExternalTableOutput:
<OleDbDestination Name="OLE DB Destination" ConnectionName="Destination" CheckConstraints="false"> <InputPath OutputPathName="CData IBMCloudObjectStorage Source.CData IBMCloudObjectStorage Source Output" /> <ExternalTableOutput Table="[<#=tableName#>]" /> </OleDbDestination>
- CustomComponent Element
- ExecuteSQL Task
- Use a control nugget to increment the counter used to iterate over the collection of entity (table) names. Do this within the Tasks element, after the end of the Dataflow element:
... </Dataflow> <# entityCounter++;}#> </Tasks> </Package> </Packages> </Biml>
Build the SSIS Project
Once the Biml file is written, right-click on the Biml file in Server Explorer and select Generate SSIS Packages. At this point, Visual Studio and BimlExpress will translate the Biml file into SSIS package(s), ready to be run.
Run the package to begin replicating your IBM Cloud Object Storage data to a SQL Server database (or any other destination you choose).
Free Trial & More Information
With the CData SSIS Components for IBM Cloud Object Storage, you get SQL access to your IBM Cloud Object Storage data directly from SSIS packages. And with Biml, you can automatically generate those packages. For more information about the CData SSIS Components for IBM Cloud Object Storage, refer to the product page. You can always get started with a free, 30-day trial. As always, our world-class CData Support Team is available if you have any questions.
Complete Biml File
<#@ template language="C#" hostspecific="true"#> <#@ import namespace="System.Data"#> <#@ import namespace="System.IO"#> <#@ import namespace="System.Collections"#> <#@ import namespace="System.Data.CData.IBMCloudObjectStorage"#> <#@ assembly name="C:\Program Files\CData\CData SSIS Components for IBMCloudObjectStorage 2018\lib\CData.SSIS2017.IBMCloudObjectStorage.dll"#> <# var ibmcloudobjectstorageConnectionString = "ApiKey=myApiKey;CloudObjectStorageCRN=MyInstanceCRN;Region=myRegion;OAuthClientId=MyOAuthClientId;OAuthClientSecret=myOAuthClientSecret;"; var replicationServer = "JDG"; var replicationCatalog = "BIML"; var replicationUserID = "sqltest"; var replicationPassword = "sqltest"; List<string> allEntityNames = new List<string>(); Hashtable entitySchema = new Hashtable(); using (IBMCloudObjectStorageConnection connection = new IBMCloudObjectStorageConnection(ibmcloudobjectstorageConnectionString)) { connection.Open(); var entities = connection.GetSchema("Tables").Rows; foreach (DataRow entity in entities) { allEntityNames.Add(entity["TABLE_NAME"].ToString()); } foreach (string entity in allEntityNames){ var columns = connection.GetSchema("Columns", new string [] {entity}).Rows; entitySchema.Add(entity,columns); } }#> <Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <CustomSsisConnection Name="CData IBMCloudObjectStorage Connection Manager" CreationName="CDATA_IBMCLOUDOBJECTSTORAGE" ObjectData="<IBMCloudObjectStorageConnectionManager><Property Name="ConnectionString"><#=ibmcloudobjectstorageConnectionString#></Property></IBMCloudObjectStorageConnectionManager>"/> <Connection Name="Destination" ConnectionString="Data Source=<#=replicationServer#>;User ID=<#=replicationUserID#>;Password=<#=replicationPassword#>;Initial Catalog=<#=replicationCatalog#>;Provider=SQLNCLI11.1;"/> </Connections> <Packages> <Package Name="Replicate IBMCloudObjectStorage Package" Language="None" ConstraintMode="LinearOnCompletion" ProtectionLevel="EncryptSensitiveWithUserKey"> <Tasks> <# int entityCounter = 0; while(entityCounter < allEntityNames.Count){ var tableName = allEntityNames[entityCounter].ToString(); if (tableName.Equals("IdpEventLog")) break; DataRowCollection columns = ((DataRowCollection)entitySchema[tableName]);#> <ExecuteSQL Name="Create <#=tableName#> Replication Table" ConnectionName="Destination"> <DirectInput> <#=HelperClass.GetDeleteAndCreateStatement(tableName,columns)#> </DirectInput> </ExecuteSQL> <Dataflow Name="Replicate <#=tableName#>"> <Transformations> <CustomComponent Name="CData IBMCloudObjectStorage Source" ComponentTypeName="CData.SSIS.IBMCloudObjectStorage.IBMCloudObjectStorageSource" Version="18" ContactInfo="[email protected]" UsesDispositions="true"> <DataflowOverrides> <OutputPath OutputPathName="CData IBMCloudObjectStorage Source Output"> <Columns> <#=HelperClass.GetColumnDefs(columns,HelperClass.DATAOVERRIDE_COLUMN) #> </Columns> </OutputPath> </DataflowOverrides> <CustomProperties> <CustomProperty Name="SQLStatement" DataType="Null" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version= 10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" SupportsExpression="true"></CustomProperty> <CustomProperty Name="AccessMode" DataType="Int32" TypeConverter="CData.SSIS.IBMCloudObjectStorage.AccessModeToStringConverter">0</CustomProperty> <CustomProperty Name="TableOrView" DataType="String" UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version= 10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" SupportsExpression="true">[<#=tableName#>]</CustomProperty> <CustomProperty Name="ExecStoredProcedure" DataType="Boolean">false</CustomProperty> </CustomProperties> <OutputPaths> <OutputPath Name="CData IBMCloudObjectStorage Source Output"> <OutputColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.OUTPUT_WITH_ERROR) #> </OutputColumns> <ExternalColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.EXTERNAL) #> </ExternalColumns> </OutputPath> <OutputPath Name="CData IBMCloudObjectStorage Source Error Output" IsErrorOutput="true"> <OutputColumns> <#=HelperClass.GetColumnDefs(columns,HelperClass.OUTPUT) #> </OutputColumns> </OutputPath> </OutputPaths> <Connections> <Connection Name="IBMCloudObjectStorage 2018 Connection" ConnectionName="CData IBMCloudObjectStorage Connection Manager" /> </Connections> </CustomComponent> <OleDbDestination Name="OLE DB Destination" ConnectionName="Destination" CheckConstraints="false"> <InputPath OutputPathName="CData IBMCloudObjectStorage Source.CData IBMCloudObjectStorage Source Output" /> <ExternalTableOutput Table="[<#=tableName#>]" /> </OleDbDestination> </Transformations> </Dataflow> <# entityCounter++;}#> </Tasks> </Package> </Packages> </Biml> <#+ public static class HelperClass { public static int OUTPUT_WITH_ERROR = 0; public static int EXTERNAL = 1; public static int OUTPUT = 2; public static int DATAOVERRIDE_COLUMN = 4; public static string GetDeleteAndCreateStatement(string tableName, DataRowCollection columns) { var dropAndCreateStatement = "IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[{0}]') AND type IN (N'U'))\r\n" + "DROP TABLE [{0}];\r\n" + "CREATE TABLE [{0}]\r\n" + "(\r\n" + "{1}\r\n" + ")\r\n" + "ON \"default\";"; string columnDefs = ""; foreach (DataRow column in columns){ string columnDef = " [{0}] {1}"; string dataType = column["DATA_TYPE"].ToString(); if (dataType.ToLower().StartsWith("bool")) { dataType = "bit"; } else if (dataType.ToLower().Equals("real")) { dataType = "float"; } else if (dataType.ToLower().Contains("varchar")) { var columnLength = column["CHARACTER_MAXIMUM_LENGTH"]; dataType = "nvarchar(" + ((int)columnLength > 4000 ? "MAX" : columnLength) + ")"; } columnDefs += String.Format(columnDef,column["COLUMN_NAME"],dataType) + ",\r\n"; } columnDefs = columnDefs.Remove(columnDefs.LastIndexOf(",\r\n"),",\r\n".Length); return String.Format(dropAndCreateStatement,tableName,columnDefs); } public static string GetColumnDefs(DataRowCollection columns, int columnType){ var columnDefTemplate = ""; var columnElements = ""; if (columnType == DATAOVERRIDE_COLUMN) { columnDefTemplate = " <Column ErrorRowDisposition=\"FailComponent\" TruncationRowDisposition=\"FailComponent\" ColumnName=\"{0}\" />\r\n"; foreach(DataRow column in columns) { var columnName = column["COLUMN_NAME"]; columnElements += String.Format(columnDefTemplate,columnName); } return columnElements; } if (columnType == OUTPUT_WITH_ERROR) columnDefTemplate = " <OutputColumn Name=\"{0}\" {1} ExternalMetadataColumnName=\"{0}\" ErrorRowDisposition=\"FailComponent\" TruncationRowDisposition=\"FailComponent\" />\r\n"; else if (columnType == EXTERNAL) columnDefTemplate = " <ExternalColumn Name=\"{0}\" {1} />\r\n"; else if (columnType == OUTPUT) columnDefTemplate = " <OutputColumn Name=\"{0}\" {1} />\r\n"; foreach(DataRow column in columns){ var columnName = column["COLUMN_NAME"]; var dataTypeRaw = column["DATA_TYPE"].ToString().ToLower(); var typeAndRelatedInfo = ""; if (dataTypeRaw.Equals("bool")) { typeAndRelatedInfo = "DataType=\"Boolean\""; } else if (dataTypeRaw.Equals("date")) { typeAndRelatedInfo = "DataType=\"Date\" SsisDataTypeOverride=\"DT_DBDATE\""; } else if (dataTypeRaw.Equals("datetime")) { typeAndRelatedInfo = "DataType=\"DateTime\""; } else if (dataTypeRaw.Equals("real")) { typeAndRelatedInfo = ((int)column["NumericPrecision"] > 0 ? "Precision=\"18\" " : " ") + ((int)column["NumericScale"] > 0 ? "Scale=\"15\" " : " ") + "DataType=\"Decimal\""; } else if (dataTypeRaw.Equals("varchar")) { var columnLength = column["CHARACTER_MAXIMUM_LENGTH"]; if ((int)columnLength > 4000) { typeAndRelatedInfo = "DataType=\"String\""; } else { typeAndRelatedInfo = "Length=\"" + columnLength + "\" DataType=\"String\" CodePage=\"1252\""; } } columnElements += String.Format(columnDefTemplate,columnName,typeAndRelatedInfo); } return columnElements; } } #>