How to load assembly in SSIS script task that isn’t in the GAC

The documented and recommended way to reference a custom assembly from an SSIS Script Task or Script Component is to install it in the Global Assembly Cache (GAC).  However this is not always possible or simple to do.

Here’s a simple workaround for loading an assembly from an arbitrary location.  The idea is to register an AppDomain.AssemblyResolve event handler.

The .NET Framework provides the AppDomain.AssemblyResolve event for applications that require greater control over assembly loading. By handling this event, your application can load an assembly into the load context from outside the normal probing paths, select which of several assembly versions to load, emit a dynamic assembly and return it, and so on. This topic provides guidance for handling the AssemblyResolve event.

Resolving Assembly Loads

Which does just what we need.  The question is where and how to wire it up.  The trick is to realize that .NET’s Just-in-Time (JIT) compilation is responsible for loading dependent assemblies, and therefore guarantees that the assemblies referenced in a type’s method won’t be loaded until just before the method is executed.  This means that in the type’s static constructor we can reliably wire up the AssemblyResolve event that will supply the referenced types.

When you create a SSIS Script Task or Script Component SSIS generates a class for you called ScriptMain, and you can wire-up the event in a Static Constructor for that type.  A Static Constructor is guaranteed to be called exactly once “before the first instance is created or any static members are referenced.”, so therefore before the dependent assemblies are loaded.

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
     static ScriptMain()
     {
         AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
     }
     static System.Reflection.Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
     {
         if (args.Name.Contains("ssisHelper"))
         {
             string path = @"c:\temp\";
             return System.Reflection.Assembly.LoadFile(System.IO.Path.Combine(path, "ssisHelper.dll"));
         }
         return null;
     }
    . . .

You can then drop the referenced assemblies in a well-known location on the SSIS server, or use a package variable to refer to the location and pass that in to the Script Task or Script Component.

If you are having issues with getting the dll path from a variable, Here is the work around. Basically you need to remove the “static” qualifier from the constructor and the event handler declarations and just make them public. So: public static ScriptMain() becomes public ScriptMain() and public static Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args) becomes public Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args).

Here is the modified version;

[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
	public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
  //the assemblyl is not registered in GAC and referenced from a folder.
  public ScriptMain()
  {
    AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyReference);
  }

  public System.Reflection.Assembly CurrentDomain_AssemblyReference(object sender, ResolveEventArgs args)
  {
   //string path = @"C:\Dev\PathToSSISHelper\";
   string path = Dts.Variables["User::PathToSSISHelper"].Value.ToString();
   if (args.Name.Contains("SSISHelper"))
   {
      return System.Reflection.Assembly.LoadFrom(System.IO.Path.Combine(path, "SSISHelper.dll"));
   }
   return null;
  }
  //end of reflection
.....

}

If you try to read the assemblies from a network share, you might get (System.NotSupportedException). Therefore replaced the LoadFile(path) with the UnsafeLoadFrom(path) call as workaround. Please use it only for your own or other wellknown assemblies, not downloaded assemblies from unknown authors, because this would be a security issue.

Here is the working code, the referenced DLL is “System.Web.Helpers.dll” and the network share path gets configured in the user variable “LibPath” (VS 2015,SQL Server 2017):

public System.Reflection.Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
{
    string path = Variables.LibPath.ToString();

    if (args.Name.Contains("System.Web.Helpers"))
    {
        return System.Reflection.Assembly.UnsafeLoadFrom(System.IO.Path.Combine(path, "System.Web.Helpers.dll"));
    }

    return null;
}

/// <summary>
/// This method is called once, before rows begin to be processed in the data flow.
/// </summary>
public override void PreExecute()
{
    base.PreExecute();

    AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
}

Resource

Load custom assembly in SSIS

https://docs.microsoft.com/en-us/archive/blogs/dbrowne/how-to-load-an-assembly-in-a-ssis-script-task-that-isnt-in-the-gac

Using class library in SSIS Projects without GAC installation

I don’t want to add custom assembly in GAC. Here is my research.

There is no direct way to do that from integration services packages, because the only workaround is using CurrentDomain_AssemblyResolve function – loading an assembly from a location instead of installing it into GAC.

There are 3 ways for using custom dll’s on deployment:

  • Assigning DLL’s to the GAC
  • Using the AssemblyResolve Function
  • Copy all Dll’s to the sql server DTS Assemblies folder (example for SQL Server 2008: C:\Program Files\Microsoft SQL Server\100\DTS\Binn) and to the .Net framework assemblies folder.

There could be an interesting workaround by using a Web Service instead of direct dll, so you can create a web service which contains the methods you are using and add a Web reference instead of a Local assembly

Here is an example.

This might be your starting point.

Using custom DLL in script task

Resources

https://docs.microsoft.com/en-us/answers/questions/92003/unable-to-reference-custom-class-library-in-ssis-s.html

https://www.codeproject.com/Articles/895028/Refering-dlls-in-SSIS-Script-Task

https://www.dotnetspider.com/resources/45645-How-to-use-custom-DLL-in-SSIS-Package.aspx

The transaction log for database ‘SampleDb’ is full due to ‘LOG_BACKUP’.

The ETL process worked fine for the last 3 days. Today it started failing. The reason, Log_Backup. The database was in Full recovery mode and transaction log got full.

As a matter of fact, the staging database has to be in Simple recovery mode.

First to view disk space occupied by database, run this;

sp_helpdb SampleDb

To change database recovery model, run this;

USE SampleDb
GO
SELECT * FROM sys.database_files

--Truncate the log by chaning the database recovery model to SIMPLE
ALTER DATABASE SampleDb
SET RECOVERY SIMPLE
GO

--Shrink the truncated log file to 1MB
DBCC SHRINKFILE (SampleDb, 1)
GO

--Reset the database recovery model, if required
/*
ALTER DATABASE SampleDb
SET RECOVERY FULL
GO
*/

If DBCC SHRINKFILE takes longer, we can use following command to see the progress;

select * from sys.dm_exec_requests

There is a Percentage_Completed and Estimated_Completion_time columns. These columns are not populate for every operation, but they are for shrink. You can find the row for your connection during the shrinking, and inspect the values to get an estimate of completion time. If the values are not changing, you’ll need to investigate whether the process is blocking something.

Upon checking the database’s Log file growth setting, the log file was limited growth of 1GB. So what happened is when the job ran and it asked SQL server to allocate more log space, but the growth limit of the log declined caused the job to failed. I modified the log growth and set it to grow by 50MB and Unlimited Growth and the error went away.

Resource

https://stackoverflow.com/questions/21228688/the-transaction-log-for-database-is-full-due-to-log-backup-in-a-shared-host/21235001

Using SSIS to pull data in chunks from remote server

Recently I hit by server memory issue while running SSIS package. I was getting binary data from remote server but the production server was unable to process it because of limited memory.

There are two choices to resolve this. Increase server memory. This wouldn’t solve the problem because the data will grow day by day.

The second is to split the batch into multiple batches say, 300 records per batch. if there are 2000 records then it would be 7 round trips to the remote server to load the data. 6 full 300 and seventh one would be 200.

Here is the design;

I will be using SQL OFFSET FETCH Feature and SSIS Script component for this. For an OLEDB example you can click on the link under Resources.

Declare four variables as follows:

1) vRowCount (Int32): Stores the total number of rows in the source table
2) vRC_IncrementValue (Int32): Stores the number of rows we need to specify in the OFFSET 
3) vRC_Increment: (Int32): Stores the number of rows we need to specify in this operation
4) vRCChunkValue (Int32): Specifies the number of rows in each chunk of data in this operation 
5) vRCBatchValue (Int32): Specifies the number of rows in each chunk of data

After declaring the variables, we assign a default value for the vRC_ChunkValue variable; in this example, we will set it to 100.

Our select query inside Script source component is like this;

string vSqlStatement = $@"
            SELECT *
	     FROM [dbo].[tblSample] 
	     WHERE 1=1
            -- get chunks
            ORDER BY SampleID
            OFFSET {Variables.vRCIncrementValue} ROWS
            FETCH NEXT {Variables.vRCBatchValue} ROWS ONLY";

Next, Add an Execute SQL Task to get the total number of rows from source table and change result set property to Single Row;

Assign the return value to vRowCount variable.

Next, add two expression task to copy values from operating variables to query variables;

Next, Add a For Loop Container, with the following configuration;

Add a data flow task inside For Loop Container. Add a script component and configure it as source;

Configure Data output on Inputs and Outputs tab;

Configure Connection on Connection Manager tab;

Click “Edit Script” and make these changes;

IDTSConnectionManager100 connMgr;
SqlConnection sqlConn;
SqlDataReader sqlReader;

public override void AcquireConnections(object Transaction)
{
    //base.AcquireConnections(Transaction);
    connMgr = this.Connections.GoldenConn;
    sqlConn = (SqlConnection)connMgr.AcquireConnection(null);

}

public override void ReleaseConnections()
{
    //base.ReleaseConnections();
    connMgr.ReleaseConnection(sqlConn);
}

public override void PreExecute()
{
    base.PreExecute();

    //create sql statement
    string vSqlStatement = $@"
    SELECT *
    FROM [dbo].[tblSample] 
    WHERE 1=1
    -- get chunks
    ORDER BY SampleID
    OFFSET {Variables.vRCIncrementValue} ROWS
    FETCH NEXT {Variables.vRCBatchValue} ROWS ONLY";

    //MessageBox.Show(vSqlStatement);

    SqlCommand cmd = new SqlCommand(vSqlStatement, sqlConn);
    /*
   7200 sec = 120 min = 2 hours. This can be set to 0 for non timeout at all
   Will this work? It also depends on server timeout settings. In most SQL install, SQL default timeout for remote queries is 600 seconds, 10 minutes.
   */
   cmd.CommandTimeout = 7200;
   sqlReader = cmd.ExecuteReader();
}

public override void PostExecute()
{
    base.PostExecute();
    sqlReader.Close();
}

public override void CreateNewOutputRows()
{
    try
    {
        while (sqlReader.Read())
        {
            {
                SampleDataBuffer.AddRow();
                SampleDataBuffer.SampleID = sqlReader.GetString(0);
                SampleDataBuffer.AddDate = sqlReader.IsDBNull(7) ? null : sqlReader.GetString(7);
            }
        }
    }
    catch (Exception ex)
    {
        //set to true to cause execution to abort
        bool cancel = false;
        //raise the error event to SSIS
        ComponentMetaData.FireError(-1, "CreateNewOutputRows()", ex.Message, "", -1, out cancel);
    }
}

Next, Add Script task inside For Loop container to calculate remaining rows;

Edit Script and add this;

// make sure all rows are accounted for
int rowCount = (int)Dts.Variables["User::vRowCount"].Value;
int rowIncrement = (int)Dts.Variables["User::vRC_Increment"].Value;
int rowChunkValue = (int)Dts.Variables["User::vRC_ChunkValue"].Value;

//this is our new offset value
rowIncrement = rowIncrement + rowChunkValue;
Dts.Variables["User::vRC_IncrementValue"].Value = rowIncrement;

//calculate remaining rows
int remainingRows = rowCount - rowIncrement;
//MessageBox.Show($"RowCount: {rowCount}\nRowIncrmenet: {rowIncrement}\nRowChunkValue{rowChunkValue}\nRemainingRows{remainingRows}");
if ((remainingRows <= rowChunkValue))
{
      //short circuit
      Dts.Variables["User::vRC_BatchValue"].Value = remainingRows;
      //for loop assign expression is [@vRC_Increment = @vRC_Increment + @vRC_ChunkValue], let's reverse this for last loop iteration
      Dts.Variables["User::vRC_Increment"].Value = rowIncrement - rowChunkValue;
}

Hope this will help.

Resources

Getting Data Chunks using OLEDB in SSIS

SSIS Package Performance and Timeout Exception

Recently I deployed a package to production and started getting this error;

System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out     at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)     at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PostExecute()     at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPostExecute(IDTSManagedComponentWrapper100 wrapper)  ClientConnectionId:b35cae06-1c27-4b61-9a32-6bf173753803  Error Number:-2,State:0,Class:11

After doing some research I found this;

SQL Server remote query timeout default value is 600seconds which is 10 minutes wait. The server that runs the package will wait for this amount of time before terminating remote sessions.

Lets increase this. I am setting this to 7200 seconds that’s 120 min that’s 2 hours. both on client and server.

Run this command on server;

EXEC sp_configure 'remote query timeout', 7200
RECONFIGURE

This might take some time. On my server it took 4 minutes.

Now run SSIS Package without re-deployment. Will it work? If no, then you need to set timeout in destination task;

For script component We can use something like this;

SqlCommand cmd = new SqlCommand(vSqlStatement, sqlConn);
/*
   7200 sec = 120 min = 2 hours. This can be set to 0 for non timeout at all
   Will this work? It also depends on server timeout settings. In most SQL install, SQL default timeout for remote queries is 600 seconds, 10 minutes.
*/
cmd.CommandTimeout = 7200;
sqlReader = cmd.ExecuteReader();

SSIS moves data using buffers. The default buffer size is 10 megabytes, with a maximum buffer size of 2^31-1 bytes. The default maximum number of rows is 10,000.

IF you look into destination buffer size, it is set to 0, which is default. If server does not support more horsepower then we can change this value to reduce buffer size, say change it from 0 to 500.

There is another good property that can help to adjust buffer size is AutoAdjustBufferSize.  Set AutoAdjustBufferSize property to true on DataFlow Task. This helps to improve performance.

Resources

Execution timeout expired the timeout period elapsed prior to completion

https://stackoverflow.com/questions/16651945/timeout-expired-the-timeout-period-elapsed-prior-to-completion-of-the-operation

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-the-remote-query-timeout-server-configuration-option?view=sql-server-ver15

https://www.mssqltips.com/sqlservertip/4221/improving-data-flow-performance-with-ssis-autoadjustbuffersize-property/

https://www.mssqltips.com/sqlservertip/3217/improve-ssis-data-flow-buffer-performance/