Large Text import from Excel using SSIS

I had an Excel file with a single column of 6000 characters (huge text). I used ACE drive and SSIS Build-in Excel connection manager.

The package worked initially with huge text configuration. I removed all the content from column and added three characters for testing. The package started failing.

Status Failed-SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.".

Status Failed-Failed to retrieve long data for column "SQLStatement".
Status Failed-There was an error with Excel - Get worksheet detail data.Outputs[Excel Source Output] on Excel - Get worksheet detail data. The column status returned was: "DBSTATUS_UNAVAILABLE".

Status Failed-SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "Excel - Get worksheet detail data.Outputs[Excel Source Output]" failed because error code 0xC0209071 occurred, and the error row disposition on "Excel - Get worksheet detail data" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.

Why?

For some strange reasons, SSIS Built-in Excel connection manager doesn’t respect the variation of content. If the content size is less than 100 and we have configured as Unicode text stream [DT_NTEXT], it will blow up.

The work around is use Script Component and use it as source.

Add a column to Output column of Output buffer;

Click on Edit Script and add paste following code;

public class ScriptMain : UserComponent
{
    DataSet excelDataSet;

    public override void PreExecute()
    {
        base.PreExecute();
        string connString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\myExcelFile.xlsx;Extended Properties=Excel 12.0";
        //Connect Excel sheet with OLEDB using connection string
        using (OleDbConnection conn = new OleDbConnection(connString))
        {
            conn.Open();
            OleDbDataAdapter objDA = new OleDbDataAdapter("SELECT * FROM [Portfolio$A10:I] WHERE Performer IS NOT NULL", conn);
            excelDataSet = new DataSet();
            objDA.Fill(excelDataSet);
        }
   public override void CreateNewOutputRows()
    {
         DataTable tbl = excelDataSet.Tables[0];
        foreach (DataRow row in tbl.Rows)
        {
            Output0Buffer.AddRow();          Output0Buffer.Description.AddBlobData(Encoding.Unicode.GetBytes(row[0].ToString()));
        }
    }

Open your Excel file and add a lot of data, for example 6000 characters, in Description column. The package will run successfully.

Again open your Excel file and add merely 3 characters in Description column. The package will run successfully again.

The success reason, We are not using Built-in Excel connection manager and are not affected by the weird behavior.

These solutions are from Microsoft Social Site;

This one seems to be a workable solution. These are pre-requisite for this approach;

  1. ACE drive installation
  2. Running following commands. Make sure you know the risk because these are related to security configuration of your database server.
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
GO
RECONFIGURE;
GO
USE [master] 
GO 

--if using 32 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1 
--if using 64 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'AllowInProcess', 1 
GO 

--if using 32 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1 
--if using 64 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'DynamicParameters', 1 
GO 

We can package following stored procedure and call it from SSIS.

DECLARE @tbl TABLE (
	col1 VARCHAR(50)
	, col2 VARCHAR(50)
	, col3 VARCHAR(50)
	, col4 VARCHAR(MAX)
);

INSERT INTO @tbl
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
   'Excel 12.0 Xml; HDR=NO;
    Database=c:\Users\Yitzhak\Documents\dada.xlsx',
    [data$]);

-- just to test
SELECT * FROM @tbl;

Output
-------
col1	col2	col3	col4
Header1	Header2	Header3	Header4
Google	Adobe	IBM	Microsoft
10	20	30	Very Long Text

These are alternative approaches just for reference;

  1. Metadata at the Excel Source component’s output (checked using Advanced Editor): DT_NTEXT
  2. Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
  3. Metadata at the OLE DB Destination component’s input (checked using Advanced Editor): DT_TEXT
  4. Target Column data type: VARCHAR(MAX)


Another approach

The “IMEX=1” extended property for OLE DB Excel connection string is used when there are mixed data types in one column. In this case, you need to change the value of the TypeGuessRows registry key so that Excel driver scans more than 8 rows (by default) to find a value that is longer than 4000 characters. If the registry key is updated successfully, the data type of the external column should be DT_NTEXT, and the data type of the output column will default to DT_NTEXT other than DT_WSTR automatically. So, I am afraid that you have not modified the registry key successfully.

If the source Excel file is .xls file, the provider used will be Microsoft JET OLE DB 4.0. In this case, we need to modify the following registry key from 8 to 0:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows

If the source Excel file is .xlsx file, the provider used will be Microsoft ACE OLE DB 12.0. In this case, we need to modify the proper registry key according to the Office Excel version:

  • Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
  • Excel 2010: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
  • Excel 2013: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Another approach

You just follow the visakh’s approach or else go with below link
http://beyondrelational.com/modules/24/syndicated/398/Posts/14945/ssis-how-to-resolve-excel-import-255-character-truncation-issue.aspx

 if above will not solve your problem, then the best way to do is:

Suppose you have a column named ‘ColumnX‘ with more than 4000 characters. Then,
Load your excel data in a temp table with splitting of ColumnX into two derived columns. I mean to say, split the data of ColumnX into two parts and store it into two columns e.g.

ColumnX –> ColumnX1– Substring(ColumnX,1,3000)
                    ColumnX2– Substring(ColumnX,3001,len(ColumnX))

and finally load it into your physical destination table say ‘DestTable’ concatenating these two columns into one.

Another approach

To add up a bit more of info on this topic, there is another registry you might need to change if none of the ones mentioned fix your issue.

To change the registry key for the 32-bit version that is running on a 64-bit operating system:

  1. In Windows environments, select Start ► Run and type REGEDIT to display the Registry Editor.
  2. In the registry tree, select HKEY_LOCAL_MACHINE ► Software ► Wow6432Node ► Microsoft ► Office ► 12.0 ► Access Connectivity Engine ► Engines.
  3. Double-click the Excel node.
  4. In the right panel, double-click the TypeGuessRows entry.
  5. Change the value data from 8 to 0.
  6. Click OK.
  7. Select File ► Exit to exit the Registry Editor window.

Related Article;

Resource

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/bd8c25f5-0ab9-4c78-9cfe-e41e6f4a3463/how-to-load-excel-data-having-fields-with-more-than-4000-characters?forum=sqlintegrationservices

https://stackoverflow.com/questions/28199929/excel-in-ssis-how-to-import-a-column-that-may-have-more-than-255-characters-whe

https://www.codingame.com/playgrounds/9014/read-write-excel-file-with-oledb-in-c-without-interop

Large text import from Flat File using SSIS

To import large text including special and foreign characters using Flat File follow this;

External Column =  Text_Stream (DT_TEXT)

Output Column = Unicode Text Stream (DT_NTEXT)

This can handle up to 11000 characters. I think it’s more than sufficient. Make sure your SQL Database has NVARCHAR(MAX) column data type.

This will work if the column has 11000 characters or 1 character. It wouldn’t break the way Excel BLOB columns break. In Excel if the source column is configured as DT_NEXT and data is 1 character, the process blows up.

Read this article, how to configure Excel.

SQL Server Agent configuration to run SSIS Packages

By default, SQL Server services run under these accounts;

Sometimes administrator might have changed these service accounts to window accounts for management reasons. To get them back to default accounts, right click on service and change them to NT Service\[Account Name] where [Account Name] is above listed account without any password. Windows know how to setup the password.

We would like to create a windows proxy account so that we can use it inside SQL server. The intent is to use this account to read/write from windows file system.

I am using my local laptop for this demo but you can use a domain account. The only difference would be to change [ComputerName]\[AccountName] to [DomainName[\[AccountName].

  1. You might have received this user from your domain administrator. If not then create a Windows user “SQLService” on your local computer and make him member of “Users” group only.
  2. Next Open SIMS -> Security -> Login -> New Login.

Server Role -> public.

User Mapping -> msdb -> db_ssisoperator, public

User Mapping -> SSISDB -> ssis admin, public

  • Next  Security -> Credentials -> New Credentials

Next SQL Server Agent -> Proxies -> SSIS Package Execution -> New Proxy

I have selected “Operating system (CmdExec) and “PowerShell” additional subsystems. You might not need these.

Under Principals, Select your windows service account;

Deploy your integration service packages. You will be able to see them under SSISDB->[Folder Name]\[Project Name]. Run your package. It will successeded because you are running them under your security context;

Create a job with a job step in SQL Server Agent. Make sure to change the context “Run as: SQLServiceProxy”;

When you run the job, it will fail. The reason, its running under the context of SQLService account. Connections within integration services are made with integrated security not SQL server.

Since we will be using SQL Login so I am going to change SSIS connection string in environment variables.

Now when you run the package, it will succeed.

Its time to check file system access. Open c:\windows\temp folder and see effective permission of SQLService account;

Grant full permission on this folder;

Script tasks require access to this folder for .NET framework and other temporary operations.

Still if you get this error;

The cause, SQLService windows account is not part of “Distributed COM Users” group on the machine where SSIS is installed. Add SQLService account to this windows group. You can read more about this here;

If the package fails again on server and you see this error in windows event viewer;

Make sure your SSIS project is targeting correct SQL Server Version. Right Click on your SSIS project -> Properties;

Deploy your project.

For SQL Server 2019 you need to set “False” for Azure-Enabled Project. As of this writing Azure-Enabled Project works with SQL Server 2017 only.

Run the package and everything should be Green now (hopefully :). We are done.

The above setup will work if we are reading/writing files in a domain controller environment or on a single server.

Setting up Azure Storage for reading/writing files

Service accounts are normally not interactive accounts. If we are using Azure Storage for reading/writing files with SAS then we would need to save Azure credentials in Credential Manager. To get to Credential Manager we need to login on server with our service account that will make it interactive.

Follow these steps;

Login to server with [servername]\SQLService account. If you are doing RDP, make sure this account is member of “Remote Desktop users” user’s group. Search for credential manager.

Add these credentials to credential manager;

Make sure you can navigate to network share by doing \\[AzureStorageAccount.file.core.windows.net\ShareName

Remove this account from “Remote Desktop users”. Run agent, still fails.

Try to Run SQL Server Agent under windows service account;

This time this started working. This is an alternative approach. We are using service account to run SQL Server agent. If this is the case then we don’t need to setup credentials and proxy accounts in SQL Server.

What happened behind the scene. Windows service account is granted “Log on as service permission” by windows. We can verify this following “Enable service log on through a local group policy” section below.

If we want to revert to “NT Service\SqlServerAgent” as service account, simply change your service account from windows service to SqlServerAgent and it would still work.

Follow this to allow an account “Login as service” permission manually if required;

Enable service log on through a local group policy

Follow these steps:

  1. Sign in with administrator privileges to the computer from which you want to provide Log on as Service permission to accounts.
  2. Go to Administrative Tools, click Local Security Policy.
  3. Expand Local Policy, click User Rights Assignment. In the right pane, right-click Log on as a service and select Properties.
  4. Click Add User or Group option to add the new user.
  5. In the Select Users or Groups dialogue, find the user you wish to add and click OK.
  6. Click OK in the Log on as a service Properties to save the changes.

Resources;

https://docs.microsoft.com/en-us/system-center/scsm/enable-service-log-on-sm?view=sc-sm-2019

SSIS Project Deployment Model

This article talks about project deployment mode;

What is .ispac file ?

Project deployment file is a self-contained unit of deployment that includes the essential information about the package and parameters in the project.

How to create .ispac file?

The project deployment model is introduced in SQL Server 2012. This is something similar to create SSISDeploymentManifest in previous SQL Server SSIS versions. They were also called package deployment model.

When we build/run a package, Visual Studio will build .ispac file in bin\Development folder of the project. Using this file, we can deploy the project without Data Tool or Visual Studio.

How to deploy SSIS packages using .ispac?

Here are different ways to deploy .ispac file;

Using SSMS

We can use SSMS for this. Create a sub-folder under SSISDB database. Right-Click on the folder and select deploy project. Follow the wizard.

Using Visual Studio

We can import ispac file into visual studio

Open Visual Studio

File -> New -> Project (Business Intelligence -> Integrations Serices)

Select “Integration Services Import Project Wizard”

Using PowerShell

We can use Powershell to enumerate .ispac file. We would have more control using this option;

Open Powershell. We might have permission issues, so run this script first;

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass 

Copy and save these two scripts on your drive. Run main script.

main.ps1

#################################################################################################
# Run following script if there is a execution policy error
#################################################################################################

#Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

#PowerShell: main.ps1
#################################################################################################
# Change source, destination and environment properties
#################################################################################################
 
# Source
$IspacFilePath ="C:\Deploy\SSISProject.ispac"
 
# Destination
$SsisServer ="(local)"
$FolderName = "Demo"
$ProjectName = "SSISProject"
 
# Environment
$EnvironmentName = "STG"
$EnvironmentFolderName = ""
 
#################################################################################################
# Execute generic deployment script
. "$PSScriptRoot\support.ps1" $IspacFilePath $SsisServer $FolderName $ProjectName $EnvironmentName $EnvironmentFolderName

This is second script “support.ps1”.

################################
########## PARAMETERS ##########
################################ 
[CmdletBinding()]
Param(
    # IsPacFilePath is required
    [Parameter(Mandatory=$True,Position=1)]
    [string]$IspacFilePath,
     
    # SsisServer is required 
    [Parameter(Mandatory=$True,Position=2)]
    [string]$SsisServer,
     
    # FolderName is required
    [Parameter(Mandatory=$True,Position=3)]
    [string]$FolderName,
     
    # ProjectName is not required
    # If empty filename is used
    [Parameter(Mandatory=$False,Position=4)]
    [string]$ProjectName,
     
    # EnvironmentName is not required
    # If empty no environment is referenced
    [Parameter(Mandatory=$False,Position=5)]
    [string]$EnvironmentName,
     
    # EnvironmentFolderName is not required
    # If empty the FolderName param is used
    [Parameter(Mandatory=$False,Position=6)]
    [string]$EnvironmentFolderName
)
 
# Replace empty projectname with filename
if (-not $ProjectName)
{
  $ProjectName = [system.io.path]::GetFileNameWithoutExtension($IspacFilePath)
}
# Replace empty Environment folder with project folder
if (-not $EnvironmentFolderName)
{
  $EnvironmentFolderName = $FolderName
}
 
clear
Write-Host "========================================================================================================================================================"
Write-Host "==                                                         Used parameters                                                                            =="
Write-Host "========================================================================================================================================================"
Write-Host "Ispac File Path        : " $IspacFilePath
Write-Host "SSIS Server            : " $SsisServer
Write-Host "Project Folder Path    : " $FolderName
Write-Host "Project Name           : " $ProjectName
Write-Host "Environment Name       : " $EnvironmentName
Write-Host "Environment Folder Path: " $EnvironmentFolderName
Write-Host "========================================================================================================================================================"
Write-Host ""
 
###########################
########## ISPAC ##########
###########################
# Check if ispac file exists
if (-Not (Test-Path $IspacFilePath))
{
    Throw  [System.IO.FileNotFoundException] "Ispac file $IspacFilePath doesn't exists!"
}
else
{
    $IspacFileName = split-path $IspacFilePath -leaf
    Write-Host "Ispac file" $IspacFileName "found"
}
 
 
############################
########## SERVER ##########
############################
# Load the Integration Services Assembly
Write-Host "Connecting to server $SsisServer "
$SsisNamespace = "Microsoft.SqlServer.Management.IntegrationServices"
[System.Reflection.Assembly]::LoadWithPartialName($SsisNamespace) | Out-Null;
 
# Create a connection to the server
$SqlConnectionstring = "Data Source=" + $SsisServer + ";Initial Catalog=master;Integrated Security=SSPI;"
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection $SqlConnectionstring
 
# Create the Integration Services object
$IntegrationServices = New-Object $SsisNamespace".IntegrationServices" $SqlConnection
 
# Check if connection succeeded
if (-not $IntegrationServices)
{
  Throw  [System.Exception] "Failed to connect to server $SsisServer "
}
else
{
   Write-Host "Connected to server" $SsisServer
}
 
 
#############################
########## CATALOG ##########
#############################
# Create object for SSISDB Catalog
$Catalog = $IntegrationServices.Catalogs["SSISDB"]
 
# Check if the SSISDB Catalog exists
if (-not $Catalog)
{
    # Catalog doesn't exists. The user should create it manually.
    # It is possible to create it, but that shouldn't be part of
    # deployment of packages.
    Throw  [System.Exception] "SSISDB catalog doesn't exist. Create it manually!"
}
else
{
    Write-Host "Catalog SSISDB found"
}
 
 
############################
########## FOLDER ##########
############################
# Create object to the (new) folder
$Folder = $Catalog.Folders[$FolderName]
 
# Check if folder already exists
if (-not $Folder)
{
    # Folder doesn't exists, so create the new folder.
    Write-Host "Creating new folder" $FolderName
    $Folder = New-Object $SsisNamespace".CatalogFolder" ($Catalog, $FolderName, $FolderName)
    $Folder.Create()
}
else
{
    Write-Host "Folder" $FolderName "found"
}
 
 
#############################
########## PROJECT ##########
#############################
# Deploying project to folder
if($Folder.Projects.Contains($ProjectName)) {
    Write-Host "Deploying" $ProjectName "to" $FolderName "(REPLACE)"
}
else
{
    Write-Host "Deploying" $ProjectName "to" $FolderName "(NEW)"
}
# Reading ispac file as binary
[byte[]] $IspacFile = [System.IO.File]::ReadAllBytes($IspacFilePath)
$Folder.DeployProject($ProjectName, $IspacFile)
$Project = $Folder.Projects[$ProjectName]
if (-not $Project)
{
    # Something went wrong with the deployment
    # Don't continue with the rest of the script
    return ""
}
 
 
#################################
########## ENVIRONMENT ##########
#################################
# Check if environment name is filled
if (-not $EnvironmentName)
{
    # Kill connection to SSIS
    $IntegrationServices = $null
 
    # Stop the deployment script
    Return "Ready deploying $IspacFileName without adding environment references"
}
 
# Create object to the (new) folder
$EnvironmentFolder = $Catalog.Folders[$EnvironmentFolderName]
 
# Check if environment folder exists
if (-not $EnvironmentFolder)
{
  Throw  [System.Exception] "Environment folder $EnvironmentFolderName doesn't exist"
}
 
# Check if environment exists
if(-not $EnvironmentFolder.Environments.Contains($EnvironmentName))
{
  Throw  [System.Exception] "Environment $EnvironmentName doesn't exist in $EnvironmentFolderName "
}
else
{
    # Create object for the environment
    $Environment = $Catalog.Folders[$EnvironmentFolderName].Environments[$EnvironmentName]
 
    if ($Project.References.Contains($EnvironmentName, $EnvironmentFolderName))
    {
        Write-Host "Reference to" $EnvironmentName "found"
    }
    else
    {
        Write-Host "Adding reference to" $EnvironmentName
        $Project.References.Add($EnvironmentName, $EnvironmentFolderName)
        $Project.Alter() 
    }
}
 
 
########################################
########## PROJECT PARAMETERS ##########
########################################
$ParameterCount = 0
# Loop through all project parameters
foreach ($Parameter in $Project.Parameters)
{
    # Get parameter name and check if it exists in the environment
    $ParameterName = $Parameter.Name
    if ($ParameterName.StartsWith("CM.","CurrentCultureIgnoreCase"))
    {
        # Ignoring connection managers
    }
    elseif ($ParameterName.StartsWith("INTERN_","CurrentCultureIgnoreCase"))
    {
        # Internal parameters are ignored (where name starts with INTERN_)
        Write-Host "Ignoring Project parameter" $ParameterName " (internal use only)"
    }
    elseif ($Environment.Variables.Contains($Parameter.Name))
    {
        $ParameterCount = $ParameterCount + 1
        Write-Host "Project parameter" $ParameterName "connected to environment"
        $Project.Parameters[$Parameter.Name].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $Parameter.Name)
        $Project.Alter()
    }
    else
    {
        # Variable with the name of the project parameter is not found in the environment
        # Throw an exeception or remove next line to ignore parameter
        Throw  [System.Exception]  "Project parameter $ParameterName doesn't exist in environment"
    }
}
Write-Host "Number of project parameters mapped:" $ParameterCount
 
 
########################################
########## PACKAGE PARAMETERS ##########
########################################
$ParameterCount = 0
# Loop through all packages
foreach ($Package in $Project.Packages)
{
    # Loop through all package parameters
    foreach ($Parameter in $Package.Parameters)
    {
        # Get parameter name and check if it exists in the environment
        $PackageName = $Package.Name
        $ParameterName = $Parameter.Name
        if ($ParameterName.StartsWith("CM.","CurrentCultureIgnoreCase"))
        {
            # Ignoring connection managers
        }
        elseif ($ParameterName.StartsWith("INTERN_","CurrentCultureIgnoreCase"))
        {
            # Internal parameters are ignored (where name starts with INTERN_)
            Write-Host "Ignoring Package parameter" $ParameterName " (internal use only)"
        }
        elseif ($Environment.Variables.Contains($Parameter.Name))
        {
            $ParameterCount = $ParameterCount + 1
            Write-Host "Package parameter" $ParameterName "from package" $PackageName "connected to environment"
            $Package.Parameters[$Parameter.Name].Set([Microsoft.SqlServer.Management.IntegrationServices.ParameterInfo+ParameterValueType]::Referenced, $Parameter.Name)
            $Package.Alter()
        }
        else
        {
            # Variable with the name of the package parameter is not found in the environment
            # Throw an exeception or remove next line to ignore parameter
            Throw  [System.Exception]  "Package parameter $ParameterName from package $PackageName doesn't exist in environment"
        }
    }
}
Write-Host "Number of package parameters mapped:" $ParameterCount
 
 
###########################
########## READY ########## 

We can run these scripts from PowerShell UI. Navigate to your folder and run these from command line like this;

.\main.ps1

Hope this will help.

Resources;

https://docs.microsoft.com/en-us/sql/integration-services/packages/deploy-integration-services-ssis-projects-and-packages?view=sql-server-ver15