C# component to read Excel data using OleDb in SSIS

This is how we can read data from Microsoft Excel using OleDb in SSIS;

public override void CreateNewOutputRows()
    {
        //Change this to your filename you do not need a connection manager
        string fileName = @"E:\SFTP\RSS\Results.xlsx";  
        string SheetName = "Sheet1";  
        string cstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1\"";

        using (System.Data.OleDb.OleDbConnection xlConn = new System.Data.OleDb.OleDbConnection(cstr))
        {
           xlConn.Open();
           System.Data.OleDb.OleDbCommand xlCmd = xlConn.CreateCommand();
           xlCmd.CommandText = "Select * from [" + SheetName + "$]"; //I assume this is the data you want
           xlCmd.CommandType = CommandType.Text;
           using (System.Data.OleDb.OleDbDataReader rdr = xlCmd.ExecuteReader())
           {
              while (rdr.Read())
              {
                 for (int i = 4; i < rdr.FieldCount; i++) //loop from 5th column to last
                 {
                    //The first 4 columns are static and added to every row
                    Output0Buffer.AddRow();
                    Output0Buffer.UniqueID = Int32.Parse(rdr[0].ToString());
                    Output0Buffer.Year = Int32.Parse(rdr[1].ToString());
                    Output0Buffer.ReportingWave = rdr.GetString(2);
                    Output0Buffer.SubmissionDate = rdr.GetString(3);
                    Output0Buffer.Question = rdr.GetName(i);
                    Output0Buffer.Answer = rdr.GetString(i);
                 }

            }
          }
          xlConn.Close();
       }
    }

We need to add output columns with correct data type. There is no need to setup connection manager because we are using it in the code.

This code will successfully import a file that looks like this;

My recent experience with Excel Sheet.

Check the data type of the column. If the type is “General”, then ACE will determine data type based on first 8-10 rows. If those are numeric then it might ignore alpha-numeric data in the column. Try to change column data type to “Text” and see what happens. It will work. You would be able to see numeric and non-numeric data in your data pipe line.

Resources;

https://stackoverflow.com/questions/61514481/get-an-ssis-c-sharp-script-component-to-read-from-excel-via-oledb

https://docs.microsoft.com/en-us/sql/integration-services/extending-packages-scripting/data-flow-script-component/configuring-the-script-component-in-the-script-component-editor?view=sql-server-ver15

Don’t’ fail parent package if child package fails

I like to keep running child packages inside sequence container even if one of them fails. Currently they are failing;

The work around is to set Sequence container “MaximumErrorCount” property to 2 from 1. After doing that, I am still getting failure error.

What we need to do is to fake the result assuming master and child packages have proper logging in place if they fail. Go to the Properties of “ExecPkg – Child Package” and set “ForceExecutionResult” to Success.

Do the same for master package. Run the package;

You can see that child package has no issues despite the fact that this has been failed. We can confirm that from log table if it’s enabled on package level.

These are the default values for a new container.

Now lets stop and study. If we compare the package behavior against the property settings, this looks wrong. Here we have set FailPackageOnFailure=False, yet a Sequence Container failure is causing a Package failure. Why is this? Unintuitive attribute names. See this Microsoft Connect issue. You are not alone in your confusion. The official explanation from Microsoft is this.

Despite some pretty circular previous messages, we believe that the feature is behaving as designed. When you set FailParentOnFailure to false, the parent will not fail until the number of failures in the child exceeds the MaximumAllowedErrors threshold. When you set FailparentOnFailure to true, the parent will fail on the first occurence of an error regardless of the MaximiumAllowedErrors threshold.

SQL Date Conversion from different data types

When we receive data feed from outside vendors, the date values are often like this;

Purchase Date
2020-07-12
NULL
'n/a'
''

The challenge is how to parse these dates and load them in SQL server table. Here is one work around;

DECLARE @purchaseDate nvarchar(10) = '9/30/2020 12:00:00 AM'
--DECLARE @purchaseDate nvarchar(10) = ''
--DECLARE @purchaseDate nvarchar(10) = 'n/a'
--DECLARE @purchaseDate nvarchar(10) = NULL

SELECT 
	CASE 
	WHEN ISDATE(ISNULL(@myDate, NULL)) = 1 THEN TRY_PARSE(@myDate AS date)
	END PurchaseDate

We are basically checking whether value is of date, if yes then we apply transformation logic.

Loading stored procedure results into tables

There are multiple methods that can be used to import stored procedures results into tables.

OPENROWSET AND OPENQUERY methods require that stored procedures don’t have;

  1. temp tables inside
  2. Don’t return duplicate columns

There is no need for a linked server, but you would need to get the connection string right. You need to specify the full path to the stored procedure including the database name and the stored procedure owner.

METHOD–1 Using OPENROWSET

This is one time step to configure database server.

--one time step
sp_configure 'Show Advanced Options', 1
GO
RECONFIGURE
GO
sp_configure 'Ad Hoc Distributed Queries', 1
GO
RECONFIGURE
GO

You can run your stored procedures that will load data into tables;

IF OBJECT_ID('tempdb..#STG_OPENROWSET') IS NOT NULL DROP TABLE #STG_OPENROWSET
SELECT * INTO #STG_OPENROWSET FROM OPENROWSET('SQLNCLI', 
'Server=(local);Trusted_Connection=yes;',
'EXEC DatabaseName.[dbo].[StoredProcedureName] @param1 = 1, @param2 = 2020')
SELECT * FROM #STG_OPENROWSET
DROP TABLE #STG_OPENROWSET

METHOD–2 Using OPENQUERY.

This is one time step to configure database server.

--one time step
Select @@ServerName
EXEC sp_serveroption @@ServerName, 'DATA ACCESS', TRUE

You can run your stored procedures that will load data into tables;

IF OBJECT_ID('tempdb..#STG_OPENQUERY') IS NOT NULL DROP TABLE #STG_OPENQUERY
SELECT  * INTO    #STG_OPENQUERY
FROM    OPENQUERY(DatabaseServerName, 
'EXEC	Database.[dbo].[StoredProcedureName] @param1 = 1, @param2 = 2020');
SELECT * FROM #STG_OPENQUERY
DROP TABLE #STG_OPENQUERY

If stored procedures are using temp tables or returning duplicate columns, you will get these errors using Method-1 and Method-2;

invalid object #tablename.

Duplicate column names are not allowed in result sets obtained through OPENQUERY and OPENROWSET”

OPENROWSET are not allowed in Azure SQL. They are allowed in a VM that is connected to Azure.

METHOD–3  Manually create temp tables

There is no database configuration required.

You can run your stored procedures that will load data into tables;

--the proce is using temp tables so this is work around
IF OBJECT_ID('tempdb..#STG_TempTable') IS NOT NULL DROP TABLE #STG_TempTable
CREATE TABLE #STG_TempTable
(
	ID int,
	[name] nvarchar(255),
	shortName nvarchar(25),
)
INSERT #STG_TempTable 
EXEC [dbo].[StoredProcedureName] @param1 = 1, @param2 = 2020
SELECT * FROM #STG_TempTable

By using this method, your database administrator will be happy because you are not making any security related changes at database server level.

References

https://stackoverflow.com/questions/653714/insert-results-of-a-stored-procedure-into-a-temporary-table

SQL Server NULL or empty value checking

This is how to check NULL or empty input values for a date column. Currently NULL or empty values produces ‘1900-01-01’ value which is not acceptable.

DECLARE @InputDate DATE

--set input to spaces
SET @InputDate = ''
--don't want 1900-01-01 output, instead NULL value
SELECT ISNULL(NULLIF(@InputDate, ''), NULL) AS InputDateSpaces

--set input to null
SET @InputDate = NULL
--don't want 1900-01-01 output, instead NULL value
SELECT ISNULL(NULLIF(@InputDate, ''), NULL) AS InputDateNULL

--set input to date
SET @InputDate = '2021-06-25'
--don't want 1900-01-01 output, instead date value
SELECT ISNULL(NULLIF(@InputDate, ''), NULL) AS InputDateRealDate

These are the results;