Large Text import from Excel using SSIS

I had an Excel file with a single column of 6000 characters (huge text). I used ACE drive and SSIS Build-in Excel connection manager.

The package worked initially with huge text configuration. I removed all the content from column and added three characters for testing. The package started failing.

Status Failed-SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21. An OLE DB record is available. Source: "Microsoft Access Database Engine" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.".

Status Failed-Failed to retrieve long data for column "SQLStatement".
Status Failed-There was an error with Excel - Get worksheet detail data.Outputs[Excel Source Output] on Excel - Get worksheet detail data. The column status returned was: "DBSTATUS_UNAVAILABLE".

Status Failed-SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "Excel - Get worksheet detail data.Outputs[Excel Source Output]" failed because error code 0xC0209071 occurred, and the error row disposition on "Excel - Get worksheet detail data" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.

Why?

For some strange reasons, SSIS Built-in Excel connection manager doesn’t respect the variation of content. If the content size is less than 100 and we have configured as Unicode text stream [DT_NTEXT], it will blow up.

The work around is use Script Component and use it as source.

Add a column to Output column of Output buffer;

Click on Edit Script and add paste following code;

public class ScriptMain : UserComponent
{
    DataSet excelDataSet;

    public override void PreExecute()
    {
        base.PreExecute();
        string connString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\myExcelFile.xlsx;Extended Properties=Excel 12.0";
        //Connect Excel sheet with OLEDB using connection string
        using (OleDbConnection conn = new OleDbConnection(connString))
        {
            conn.Open();
            OleDbDataAdapter objDA = new OleDbDataAdapter("SELECT * FROM [Portfolio$A10:I] WHERE Performer IS NOT NULL", conn);
            excelDataSet = new DataSet();
            objDA.Fill(excelDataSet);
        }
   public override void CreateNewOutputRows()
    {
         DataTable tbl = excelDataSet.Tables[0];
        foreach (DataRow row in tbl.Rows)
        {
            Output0Buffer.AddRow();          Output0Buffer.Description.AddBlobData(Encoding.Unicode.GetBytes(row[0].ToString()));
        }
    }

Open your Excel file and add a lot of data, for example 6000 characters, in Description column. The package will run successfully.

Again open your Excel file and add merely 3 characters in Description column. The package will run successfully again.

The success reason, We are not using Built-in Excel connection manager and are not affected by the weird behavior.

These solutions are from Microsoft Social Site;

This one seems to be a workable solution. These are pre-requisite for this approach;

  1. ACE drive installation
  2. Running following commands. Make sure you know the risk because these are related to security configuration of your database server.
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ad Hoc Distributed Queries', 1;
GO
RECONFIGURE;
GO
USE [master] 
GO 

--if using 32 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1 
--if using 64 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'AllowInProcess', 1 
GO 

--if using 32 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1 
--if using 64 bit office
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.16.0', N'DynamicParameters', 1 
GO 

We can package following stored procedure and call it from SSIS.

DECLARE @tbl TABLE (
	col1 VARCHAR(50)
	, col2 VARCHAR(50)
	, col3 VARCHAR(50)
	, col4 VARCHAR(MAX)
);

INSERT INTO @tbl
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
   'Excel 12.0 Xml; HDR=NO;
    Database=c:\Users\Yitzhak\Documents\dada.xlsx',
    [data$]);

-- just to test
SELECT * FROM @tbl;

Output
-------
col1	col2	col3	col4
Header1	Header2	Header3	Header4
Google	Adobe	IBM	Microsoft
10	20	30	Very Long Text

These are alternative approaches just for reference;

  1. Metadata at the Excel Source component’s output (checked using Advanced Editor): DT_NTEXT
  2. Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
  3. Metadata at the OLE DB Destination component’s input (checked using Advanced Editor): DT_TEXT
  4. Target Column data type: VARCHAR(MAX)


Another approach

The “IMEX=1” extended property for OLE DB Excel connection string is used when there are mixed data types in one column. In this case, you need to change the value of the TypeGuessRows registry key so that Excel driver scans more than 8 rows (by default) to find a value that is longer than 4000 characters. If the registry key is updated successfully, the data type of the external column should be DT_NTEXT, and the data type of the output column will default to DT_NTEXT other than DT_WSTR automatically. So, I am afraid that you have not modified the registry key successfully.

If the source Excel file is .xls file, the provider used will be Microsoft JET OLE DB 4.0. In this case, we need to modify the following registry key from 8 to 0:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows

If the source Excel file is .xlsx file, the provider used will be Microsoft ACE OLE DB 12.0. In this case, we need to modify the proper registry key according to the Office Excel version:

  • Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
  • Excel 2010: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
  • Excel 2013: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Another approach

You just follow the visakh’s approach or else go with below link
http://beyondrelational.com/modules/24/syndicated/398/Posts/14945/ssis-how-to-resolve-excel-import-255-character-truncation-issue.aspx

 if above will not solve your problem, then the best way to do is:

Suppose you have a column named ‘ColumnX‘ with more than 4000 characters. Then,
Load your excel data in a temp table with splitting of ColumnX into two derived columns. I mean to say, split the data of ColumnX into two parts and store it into two columns e.g.

ColumnX –> ColumnX1– Substring(ColumnX,1,3000)
                    ColumnX2– Substring(ColumnX,3001,len(ColumnX))

and finally load it into your physical destination table say ‘DestTable’ concatenating these two columns into one.

Another approach

To add up a bit more of info on this topic, there is another registry you might need to change if none of the ones mentioned fix your issue.

To change the registry key for the 32-bit version that is running on a 64-bit operating system:

  1. In Windows environments, select Start ► Run and type REGEDIT to display the Registry Editor.
  2. In the registry tree, select HKEY_LOCAL_MACHINE ► Software ► Wow6432Node ► Microsoft ► Office ► 12.0 ► Access Connectivity Engine ► Engines.
  3. Double-click the Excel node.
  4. In the right panel, double-click the TypeGuessRows entry.
  5. Change the value data from 8 to 0.
  6. Click OK.
  7. Select File ► Exit to exit the Registry Editor window.

Related Article;

Resource

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/bd8c25f5-0ab9-4c78-9cfe-e41e6f4a3463/how-to-load-excel-data-having-fields-with-more-than-4000-characters?forum=sqlintegrationservices

https://stackoverflow.com/questions/28199929/excel-in-ssis-how-to-import-a-column-that-may-have-more-than-255-characters-whe

https://www.codingame.com/playgrounds/9014/read-write-excel-file-with-oledb-in-c-without-interop

FavoriteLoadingAdd to favorites
Spread the love

Author: Shahzad Khan

Software developer / Architect