Using LINQ methods to compare objects of custom type

I have a master list;

v2-2021 – ‘cover page$’
v2-2021 – ‘i# milestones$’
v2-2021 – ‘ii# tasks$’
v2-2021 – ‘iii# spendplan$’

I have a sub list;

v2-2021 – ‘cover page$’
v2-2021 – ‘i# milestones$’
v2-2021 – ‘ii# tasks$’

I want to make sure that all elements in my sub list exists in master list.

To solve this i have created this class;

internal class ExcelVersions
{
    public string VersionNumber { get; set; }
    public string TableName { get; set; }

}

I have created following objects based on this class;

List<ExcelVersions> cfirstList = new List<ExcelVersions>
{
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'cover page$'" },
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'i# milestones$'" },
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'ii# tasks$'" },
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'iii# spendplan$'" }
            };
            List<ExcelVersions> csecondList = new List<ExcelVersions>
            {
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'cover page$'" },
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'i# milestones$'" },
                new ExcelVersions { VersionNumber = "v2-2021", TableName = "'ii# tasks$'" }
            };
            //var cexceptList = csecondList.Except(cfirstList, new ExcelVersionsComparer());
            var cexceptList = csecondList.Except(cfirstList);
            Console.WriteLine($"\ncSecondList-->cFirstList: Value in second list that are not in first List");
            foreach (var val in cexceptList)
            {
                Console.WriteLine($"{val.TableName}");
            }
            IsASubset = csecondList.All(i => cfirstList.Contains(i));
            Console.WriteLine($"\ncSecondList-->cFirstList: all members of subset (cSecondList) exists in list1 (cFirstList): {IsASubset}");
}

This is the result i get;

To my surprise, none of LINQ comparison method worked on custom class. What’s wrong? The answer is in the LINQ implementation. To be correctly processed by the Except method, a type must implement the IEquatable<T> interface and provide its own Equals and GetHashCode methods.

Re-writing out custom type;

internal class ExcelVersions : IEquatable<ExcelVersions>
{
        public string VersionNumber { get; set; }
        public string TableName { get; set; }

        public bool Equals(ExcelVersions other)
        {
            //check whether the compare object is null
            if (Object.ReferenceEquals(other, null)) return false;
            //check whether the compared object references the same data
            if (Object.ReferenceEquals(this, other)) return true;
            //check whether the object's properteis are equal
            return VersionNumber.Equals(other.VersionNumber) && TableName.Equals(other.TableName);
        }

        //if Equals returns true for a pair of objects
        //GetHashCode must return the same value for these objects
        public override int GetHashCode()
        {
            //Get the hash code for the version number
            int hashVersionNumber = VersionNumber == null ? 0 : VersionNumber.GetHashCode();
            //get the hash code for the table name
            int hashTableName = TableName.GetHashCode();

            //calculate the hash code for the object
            return hashVersionNumber ^ hashTableName;
    }
}

This time the results are;

OK. Custom class is working but what if we cannot modify the type? What if it was provided by a library and we have no way of implementing the IEquiatable<T> interface. The answer is to create our own equality comparer and pass it as a parameter to the Except method.

The equality comparer must implement the IEqualityComparer<T> interface and provide GetHashCode and Equals method like this;

internal class ExcelVersionsComparer : IEqualityComparer<ExcelVersions>
{
        public bool Equals(ExcelVersions x, ExcelVersions y)
        {
            if (Object.ReferenceEquals(x, y))
                return true;

            if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
                return false;
            return x.Equals(y);
        }

        public int GetHashCode(ExcelVersions excelVersion)
        {
            if (Object.ReferenceEquals(excelVersion, null)) return 0;

            int hashVersion = excelVersion.VersionNumber == null ? 0 : excelVersion.GetHashCode();
            int hashTable = excelVersion.TableName.GetHashCode();

            return hashVersion ^ hashTable;
        }
}

This is how we are going to pass the comparer to the Except method;

var cexceptList = csecondList.Except(cfirstList, new ExcelVersionsComparer());

These rules don’t just apply to Except method. For example, the same is true for the Distinct, Contains, Interset and Union methods. Generally, if you see that a LINQ method has an overload that accepts the IEqualityComparer<T> parameter, means that to use it with your own data type, you need to either implement IEquatable<T> in your class or create your own equality comparer.

If you want to use built-in class instead of creating custom class, consider this class;

Reference

https://docs.microsoft.com/en-us/archive/blogs/csharpfaq/how-to-use-linq-methods-to-compare-objects-of-custom-types

https://stackoverflow.com/questions/16824749/using-linq-except-not-working-as-i-thought

https://grantwinney.com/how-to-compare-two-objects-testing-for-equality-in-c/

https://www.tutorialspoint.com/how-to-find-items-in-one-list-that-are-not-in-another-list-in-chash


SSIS: Read Excel Tables

Define two variables; ExcelFile –> String and ExcelTables –> Object. Drop a script task on designer surface;

Here is the script;

public void Main()
{
            string excelFile;
            string connectionString;
            OleDbConnection excelConnection;
            DataTable tablesInFile;
            int tableCount = 0;
            string currentTable;
            int tableIndex = 0;

            //string[] excelTables = new string[5];
            string[] excelTables = new string[20];

            excelFile = Dts.Variables["ExcelFile"].Value.ToString();
            connectionString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={excelFile};Extended Properties=Excel 12.0";
            excelConnection = new OleDbConnection(connectionString);
            excelConnection.Open();
            tablesInFile = excelConnection.GetSchema("Tables");
            tableCount = tablesInFile.Rows.Count;

            foreach (DataRow tableInFile in tablesInFile.Rows)
            {
                currentTable = tableInFile["TABLE_NAME"].ToString();
                char lastCharacter = currentTable[currentTable.Length - 2];
                if (lastCharacter == '$')
                {
                    excelTables[tableIndex] = currentTable;
                    tableIndex += 1;
                }
            }

            Dts.Variables["ExcelTables"].Value = excelTables;

            Dts.TaskResult = (int)ScriptResults.Success;
}

You can display object variables values using this;

string[] tablesInFile = (string[])Dts.Variables["ExcelTables"].Value;
            foreach (string tableInFile in tablesInFile)
            {
                results += " " + tableInFile + EOL;
            }

            MessageBox.Show(results, "Results", MessageBoxButtons.OK, MessageBoxIcon.Information);

All Excel – Programmatically Reading/Writing

ERR: External table is not in the expected format;

The error might be the result of wrong extended properties in connection string. Check code examples below;

Following code will help you to open *.xls files;

string connString = string.Format("Provider = Microsoft.ACE.OLEDB.12.0; Data Source ={0}; Extended Properties = 'EXCEL 8.0 XML;HDR=YES;IMEX=1'", fullPathToExcel);
using (OleDbConnection conn = new OleDbConnection(connString))
{
   conn.Open();
  //...any other instruction
}

Following code will help you to open *.xlsx files;

string connString = string.Format("Provider = Microsoft.ACE.OLEDB.12.0; Data Source ={0}; Extended Properties = 'EXCEL 12.0 XML;HDR=YES;IMEX=1'", fullPathToExcel);
using (OleDbConnection conn = new OleDbConnection(connString))
{
   conn.Open();
  //...any other instruction
}

Excel ADO.NET properties for different excel version are;

Excel 97-2003 Workbook (.XLS) –> “Excel 80”

Excel 2007-2010 Workbook (.XLSX) –> “Excel 12.0 XML”

Excel 2007-2010 Macro-enabled Workbook (.XLSM) –> “Excel 12.0 MACRO”

This is how you can run a select query;

"SELECT * FROM [StoreSheet$A13:I]"

If you are not able to solve External table is not in the expected format error, consider changing driver. try NPOI or ExcelDataReader.

ExcelDataReader Library

This library doesn’t work with .NET core.

EPPlus Library

EPPlus library works with .NET core.

Working with Excel Files with the help of Script Task

Integration Services provides the Excel connection manager, Excel source, and Excel destination for working with data stored in spreadsheets in the Microsoft Excel file format. The techniques described in this topic use the Script task to obtain information about available Excel databases (workbook files) and tables (worksheets and named ranges).

You can read more here.

How to unprotect the Excel sheet

https://techcommunity.microsoft.com/t5/excel/how-to-unprotect-the-excel-sheet-if-forgot-the-password/m-p/1574559

Common variables and Namespaces used to read Excel files

Variables. Open the Variables window and define the following variables:
1- ExcelFile, of type String. Enter the complete path and filename to an existing Excel workbook.
2 -ExcelTable, of type String. Enter the name of an existing worksheet or named range in the workbook
named in the value of the ExcelFile variable. This value is case-sensitive.
3 – ExcelFileExists, of type Boolean.
4 – ExcelTableExists, of type Boolean.
5 – ExcelFolder, of type String. Enter the complete path of a folder that contains at least one Excel workbook.
6 – ExcelFiles, of type Object.
7 – ExcelTables, of type Object.

Imports statements
You will need following .NET Framework namespaces;
1 – System.IO, for file system operations.
2- System.Data.OleDb, to open Excel files as data sources.

References.
The code that read schema information from Excel files require an additional reference in the script project to the System.Xml namespace.

SSIS: Converting a string to byte-array

This is a short how-to convert a string to byte-array without any encoding (byte-by-byte) in SQL Server Integration Services (SSIS).

I would like to save this string “I love you” as BLOB. Let’s assume our Output column name is fileContent.

Row.filecontent.AddBlobData(GetBytes("I love you"));

AND

byte[] GetBytes(string str)
{
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
}

Reference

https://stackoverflow.com/questions/35703025/how-do-i-convert-a-string-to-a-blobcolumn-in-ssis/46837584

https://stackoverflow.com/questions/472906/how-do-i-get-a-consistent-byte-representation-of-strings-in-c-sharp-without-manu/10380166#10380166

SSIS: Use Sql connection inside script component

If the requirement is to run dynamic query on each row, you might get inside Script comoponent. Here is how you can grab SQL connection and run your queries;

Your connection is defined here on script component;

Edit Script…
public class ScriptMain : UserComponent
{
    IDTSConnectionManager100 connMgr;
    SqlConnection sqlConn;
    SqlDataReader sqlReader;

    public override void AcquireConnections(object Transaction)
    {
        connMgr = this.Connections.MyConn;
        sqlConn = (SqlConnection)connMgr.AcquireConnection(null);
    }

    public override void ReleaseConnections()
    {
        //base.ReleaseConnections();
        connMgr.ReleaseConnection(sqlConn);
    }

    public override void PreExecute()
    {
        base.PreExecute();
        string KeyValue = Variables.vFileOutputString;

        String vSqlStatement = "SELECT SqlStatement FROM [DataHub].[vw_LastStepForRglData] WHERE ADNumber = '" + KeyValue + "' "; 
        SqlCommand cmd = new SqlCommand(vSqlStatement, sqlConn);
        sqlReader = cmd.ExecuteReader();
    }

    public override void PostExecute()
    {
        base.PostExecute();
        sqlReader.Close();
    }

    public override void CreateNewOutputRows()
    {
        while (sqlReader.Read())
        {
            {
                DynamicSqlBuffer.AddRow();
                DynamicSqlBuffer.SqlStatement = sqlReader.GetString(0);
            }
        }
    }
}

Hope this will help.