Download Excel Files

Recently i had to download binary format data using ASP.NET Core, Console Core and SSIS applications;

ASP.NET Core application;

Here is a work around for ASP.NET Core Where we are getting back File Result;

[HttpGet]
public IActionResult GetExcelData2(Guid? documentId)
{
            var sql = "SELECT TOP(1) FileName, ContentType, FileStreamCol FROM [dbo].[STG_Document] WHERE 1 = 1 AND DocumentId = @documentId";

            byte[] fileContent = null;
            string mimeType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
            string fileName = "C:\\Business\\myfile.xlsx";

            using (var connection = new SqlConnection(AppConnectionString.Value.PMBImportsConn))
            {
                connection.Open();
                var result = connection.Query(sql, new { DocumentId = documentId });
                result.AsList().ForEach(x =>
                {
                    fileContent = x.FileStreamCol;
                    mimeType = x.ContentType;
                    fileName = x.FileName;
                });

                return File(new MemoryStream(fileContent, 0, fileContent.Length - 1), mimeType, fileName);
            }
}

I am subtracting 1 element from byte array to fix a problem I encountered in a VB6 application. Otherwise, you don’t need to subtract this element.

 return File(new MemoryStream(fileContent, 0, fileContent.Length), mimeType, fileName);

This is how we call it from view;

<a asp-controller="Project" asp-action="GetExcelData2" asp-route-documentId="@p.DocumentId">Download</a>

If everything is correct, the download starts automatically.

C# Console application;

Here is how this will work in C# Console application;

internal void DownloadExcelFiles()
{
            // Create folder to save downloaded files
            string myFolderPath = @"C:\WorkToDo\";
                        Directory.CreateDirectory(myFolderPath + "\\FilesDownloaded");

            //run sql query to grab image files
            var sql = "SELECT DocumentId AS DocumentId, ContentType, FileName, FileStreamCol  FROM dbo.STG_Document WHERE 1 = 1";

            byte[] fileContent = null;
            string mimeType = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
            string fileName = "";

            using (var connection = new SqlConnection(DB2SqlConnectionString))
            {
                connection.Open();
                var result = connection.Query(sql);
                result.AsList().ForEach(x =>
                {
                    string fileExtension = "xlsx";
                    switch (x.ContentType)
                    {
                        case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":
                        case "application/vnd.ms-excel":
                        case "application/msexcel":
                            fileExtension = "xlsx";
                            break;
                        case "application/vnd.ms-excel.sheet.macroEnabled.12":
                            fileExtension = "xlsm";
                            break;
                        default:
                            break;
                    }
                    string fileToWrite = $"{myFolderPath}\\FilesDownloaded\\{x.DocumentId}.{fileExtension}";
                    fileContent = x.FileStreamCol;
                    mimeType = x.ContentType;
                    fileName = x.FileName;

                    FileStream fileStream = File.OpenWrite(fileToWrite);
                    fileStream.Write(fileContent, 0, fileContent.Length - 1);
                    fileStream.Flush();
                    fileStream.Close();
                });
                Console.WriteLine("Download complete..");
            }           
}

I am subtracting 1 element from byte array to fix a problem I encountered in a VB6 application. Otherwise, you don’t need to subtract this element.

FileStream fileStream = File.OpenWrite(fileToWrite);
fileStream.Write(fileContent, 0, fileContent.Length);

I ma using Dapper and SqlClient libraries ASP.NET Core and Console application to access data from Database.

SSIS application;

Here is how this will work in SQL Server Integration services;

ADO.NET Source

SELECT DocumentId, [FileName], [ContentType], [FileStreamCol]
FROM [dbo].[Document]
WHERE 1=1

SC-Transform data

This is a SSIS Transformation script component. This prepares the data that will be used later on in Export Column task;

Here is the code block;

private byte[] fileContent = null;
private string fileExtension = String.Empty;

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
        //determine content type
        switch (Row.ContentType)
        {
            case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":
            case "application/vnd.ms-excel":
            case "application/msexcel":
                fileExtension = "xlsx";
                break;
            case "application/vnd.ms-excel.sheet.macroEnabled.12":
                fileExtension = "xlsm";
                break;
            default:
                break;
        }
Row.fileContentName = String.Concat(Row.DocumentId, ".", fileExtension);
//A VB6 application saves excel files as BLOB objects. For download it subtracts one element from 
//byte array. Following same dance here
        var blobLength = Convert.ToInt32(Row.FileStreamCol.Length);
        fileContent = Row.FileStreamCol.GetBlobData(0, blobLength - 1);
        Row.fileContent.AddBlobData(fileContent);
}

Derived column transform;

Export column settings;

Checking Force Truncate overwrites extracted files.

Hope this will help.

Filestream in SQL Server

We want to save multiple type of data, documents, image, audio or video. BLOB data type can be used to store this kind of data but it’s limited to 2GB in size. BLOB data slows down database performance and takes significant resources to read. For detail, refer here ;

https://shahzadblog.com/2021/03/29/store-and-retrieve-blob-data-without-file-stream/

Filestream allows storing these large documents onto the file system itself. There is not limit on storage as compared to 2GB in BLOB storage. Application can access these files using NTFS streaming API. Filestream is not a SQL Server data type to store data.

Filestream does not use the buffer pool memory for caching these objects. If we cache these large objects in the SQL Server memory, it will cause issues for normal database processing. Therefore, FILESTREAM provides caching at the system cache providing performance benefits without affecting core SQL Server performance.

This is how to enable Filestream feature in SQL server;

Since I will be doing a lot of inserts/update/delete so I am going to change recovery mode of selected database to “Simple”.

Make sure that you have enable FILESTREAM on database level;

Checking the status of Filestream;

--check filestream status
USE master
Go
EXEC sp_configure
Go

0 = FILESTREAM is disabled.

1 = only T-SQL access to FILESTREAM data is allowed.

2 = T-SQL access and local streaming access is allowed.

3 = T-SQL access and local and remote streaming access is allowed.

You can also use this query to check status;

exec sp_configure 'filestream access level'

I am going to create my first table with filestream enabled;

--Create a table to work with Filestream
USE TODO
GO
CREATE TABLE [dbo].[Document](
	[Doc_Num] uniqueidentifier ROWGUIDCOL unique NOT NULL DEFAULT newsequentialid(),
	[Extension] [varchar](50) NULL,
	[FileName] [varchar](200) NULL,
    [FileSize] numeric(10,5),
	[Doc_Content] [varbinary](max) FILESTREAM
) ON [PRIMARY] 

Oops. I got this error;

I haven’t enabled Filestream feature in SQL server. Open SQL Server Configuration manager. Right click on SQL Server Services, properties and FileSTREAM.

Restart SQL server service.

Re-run table creation query. This time you will see this error;

What it means that we have not created File group. Remember Filestream require its own File groups. Right click on your database;

After creating filegroup, open database properties again and click on files. Click on Add to add new file and connect with previously created filegroup.

Navigate to your filesystem path and you should see this;

As shown in the above image, the $FSLOG directory and the filestream.hdr file have been created. $FSLOG is like SQL server T-Log, and filestream.hdr contains metadata of FILESTREAM. Make sure that you do not change or edit those files.

re-run our table creation sql script. This time table should be created successfully.

It’s time to add a new excel document in this table. Run following sql script;

--insert a new excel document
INSERT [dbo].[Document1] ([Extension] ,[FileName], [FileSize], [Doc_Content] )
SELECT 'xlsx', 'SampleExcelDoc.xlsx', 10, [Doc_Data].*
FROM OPENROWSET 
    (BULK 'D:\SimpleExcelDoc.xlsx', SINGLE_BLOB)  [Doc_Data]

Run this statement;

SELECT * FROM [dbo].[Document]

Here is the file on file system;

This file can be opened directly with compatible application program, in this example Excel. Right click, open with and select Excel. When we insert the document using the FILESTREAM feature, SQL Server copies the file into the FILESTREAM path. It does not change the file properties.

To access Filestream data using managed API, we need to figure out Filestream content path. Here is the script;

--access Filestream data using managed API
SELECT Doc_Content.PathName() AS FilePath, * 
FROM [dbo].[Document]

You should see these results;

To delete a document from Filestream container, run this;

--delete files from Filestream container.
USE TODO
GO
DELETE FROM dbo.Document
WHERE Doc_Num = '658020B3-1690-EB11-90BB-00155D717606'

In this case, we can see that file still exists for FILESTREAM document. SQL Server removes the old files using the garbage collection process. This process will remove the old file if it is no longer required by the SQL Server. Do you remember about a folder, $log into the FILESTREAM path. It works similar to a transaction log in the SQL Server database.

SQL Server has a particular internal filestream_tombstone table, which contains an entry for this old file.

SELECT * FROM sys.internal_tables where name like ‘filestream_tomb%’

The garbage collection process removes the old file only if there is no entry for the file in the filestream_tombstone table. We cannot see the content of this filestream_tombstone table. However, if you wish to do so, use the DAC connection.

The database recovery model plays an essential role in the entry in the filestream_tombstone table.

  • Simple recovery mode: In a database in simple recovery model, the file is removed on next checkpoint
  • Full recovery model: If the database is in full recovery mode, we need to perform a transaction log backup to remove this file automatically

We can run the garbage collector process file manually to remove the files from the container before the automatic garbage collector process cleans the file. We can do it using ‘sp_filestream_force_garbage_collection’ stored procedure.

Resource

https://docs.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server?view=sql-server-ver15

https://www.pmichaels.net/tag/default-filestream-filegroup-is-not-available-in-database/

https://www.sqlshack.com/managing-data-with-sql-server-filestream-tables/

https://www.pmichaels.net/tag/default-filestream-filegroup-is-not-available-in-database/

https://www.jitendrazaa.com/blog/sql/sqlserver/export-documents-saved-as-blob-binary-from-sql-server/