SSIS Excel Data Source Column overriding

When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. How do we override this behaviour?

Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.

Just go into the output column list on the Excel source and set the type for each of the columns. Let excel do its guessing game. We are interested in output column and type. This works.

To get to the input columns list right click on the Excel source, select ‘Show Advanced Editor’, click the tab labeled ‘Input and Output Properties’.

A potentially better solution is to use the derived column component where you can actually build “new” columns for each column in Excel. This has the benefits of

  1. You have more control over what you convert to.
  2. You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
  3. Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)

This is another work around but it does not work at run time. You can see the data at design time though;

Write your SQL command and convert columns to text.

You can verify this on Advance tab / Input and Output Properties tab of Excel source. All of converted columns under External Column would be changed to “Unicode text stream” data type

SSIS Expression Sample List

Sometime its kind a hard to remember different SSIS syntax and how to use them. I build up a list to help me out. Expressions used are from AdventureWorks sample provided with Microsoft SQL Server.

Converting String to Guid in Derived column expression;

(DT_GUID)("{" + [ColumnName] + "}")

If using dynamic Sql in script component or variables, get Guid from database as String;

CAST([GuidColumn] AS NVARCHAR(60)) AS GuidColumn

Database will convert string to Guid on query submission.

Boolean expression

If incoming data type is text then use this for Boolean conversion;

(DT_BOOL)((DT_WSTR,1)Rejected == “1” ? TRUE : FALSE )

This will also handle null values in incoming data.

How to Get file name and file extension in SSIS Expression?

Suppose this is the file name;

@FileName = 6be8bf19-b715-ec11-b1cb-000d3adde0a7.xlsx

This is how we will get extension and file name;

--get file extensions
REVERSE(left(REVERSE(@[User::FileName]), FINDSTRING(REVERSE(@[User::FileName]) , "." , 1 ) - 1))

Result
------
xlsx

------get file name
SUBSTRING(@[User::FileName], 1, FINDSTRING(@[User::FileName] , "." , 1 ) - 1)

Result
------
6be8bf19-b715-ec11-b1cb-000d3adde0a7

--same result can be achieved by using this statement

REVERSE(LEFT(@[User::FileName], FINDSTRING(@[User::FileName] , "." , 1 ) - 1))

Result
------
6be8bf19-b715-ec11-b1cb-000d3adde0a7

How to get only Date from DateTime variable?

SUBSTRING( (DT_STR,50, 1256)DATEADD("DAY",-1,GETDATE()) , 1, 10)

Parameterize ADO.NET Source in Data Flow

Parameterizing OLEDB as source in Data Flow is easier. It’s not that straight using ADO.NET.

In SSIS you can’t parametrize ADO.NET source. You have to use a workaround.

Luckily, there are few workarounds. One would be creating Script Component that acts like source and code it. However, one can’t always easily convert the existing resource into script, especially when he lacks ADO.NET programming knowledge.

There is another workaround, and that would be creating the SQL Query before the ADO.NET Source takes action. However, when you open ADO.NET source, you will notice that Data access mode doesn’t allow variable input. So, how do you proceed?

You want to dynamically set the SQL expression of the ADO.NET source, so you have to tell your data flow task to configure the SSIS ADO.NET source component by using Expression.

To make the long story short (or not-quite-so-short :), do this:

  • in your package, enter your data flow task with source/destination components
  • click anywhere on the background, to have Task properties shown in Property panel
  • in Property panel find Expressions property, that can configure various data source/destination properties, and open it using ellipsis button (…)
  • under Property, select SQL Command property of your source (e.g. [ADO.NET source].[SqlCommand]) to add one row
  • click ellipsis button for the row to open Expression Builder
  • build your dynamic query in the Expression Builder

The last step could be somewhat cumbersome for date/datetime parameter. However, here’s the example, for your convenience:

"SELECT * FROM YOUR_SOURCE_TABLE WHERE your_date_column = '" + 
  (DT_WSTR,4)YEAR(@[User::VAR_CONTAINING_DATE]) + "-" +
  (DT_WSTR,2)MONTH(@[User::VAR_CONTAINING_DATE]) + "-" +
  (DT_WSTR,2)DAY(@[User::VAR_CONTAINING_DATE]) + "'"

Here is a reference for using OLEDB as source;

Get File Extension using T-SQL

The easies way to do this using SQL Server 2017 and up;

DECLARE @FileName NVARCHAR(255) = N'Need to get extension of this file.xlsx';
--PRINT @FileName
SELECT 
	@FileName ExcelFileName,
	--check there is a '.' in ExcelFileName
	CASE WHEN @FileName LIKE '%.%' THEN
		REVERSE(left(REVERSE(@FileName), CHARINDEX('.', REVERSE(@FileName)) - 1))
	ELSE ''
	END ExcelFileExtension
WHERE 1=1

Earlier versions can use this approach;

DECLARE @FileName NVARCHAR(255) = N'Need to get extension of this file.xlsx';
--PRINT @FileName
SELECT 
	@FileName ExcelFileName,
	CASE 
         WHEN @FileName LIKE '%.%' THEN 
			RIGHT(@FileName, LEN(@FileName) - CHARINDEX('.', @FileName)) 
         ELSE '' 
       END ExcelFileExtension 
WHERE 1=1