Difference between SSIS, Azure Data Factory and Azure Data Bricks

For Data Engineering workloads within Microsoft landscape, there are multiple options to carry out Data Engineering tasks to extract data from myriad of data sources. Currently three options are available:

  • SQL Server Integration Services (SSIS): It is part of Microsoft SQL Server Suite and SSIS is a very well-known popular ETL tool for Data Integration along with rich built in transformations. Introduced in 2005. Mainly for on-premises. Now you can run on-premise as well. Aggregations, splits and joins.
  • Azure Data Factory (ADF): Unlike SSIS, ADF is a ELT tool along with Data Orchestration tool to build pipelines to move data across different layers. From on-Premise to Cloud and within Cloud landscape. Movement and Orchestration but not Transformations.
    • Data movement & Orchestration
    • Extract, Load & Transform
    • Transformation activities.

People familiar with SSIS can use it and existing SSIS packages can also be migrated.

Azure Data Bricks: Azure Data Bricks is latest entry into this for Data engineering and Data Science workloads, unlike SSIS and ADF which are more of Extract Transform Load (ETL), Extract Load Transform (ELT) and data Orchestration tools, Azure data bricks can handle data Engineering and data science workloads.

In a nutshell, although you can compare and contrast these tools, they actually compliment each other. For example you can call existing SSIS packages using Azure Data Factory and trigger Azure data bricks notebooks using Azure Data Factory.

How to Read blob file from Microsoft Azure Storage with .NET Core

In order to read a blob file from a Microsoft Azure Blob Storage, you need to know the following:

  • The storage account connection string. This is the long string that looks like this:
    DefaultEndpointsProtocol=https;
    AccountName=someaccounfname;
    AccountKey=AVeryLongCrypticalStringThatContainsALotOfChars==
  • The blob storage container name. This is the name in the list of “Blobs”.
  • The blob file name. This is the name of the blob inside the container. A file name can be in form of a path, as blobs are structured as a file structure inside the container. For ecample: folder/folder/file.extension

You also need this NuGet package:

Windows.Azure.Storage

The code is pretty simple:

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
 
public string GetBlob(string containerName, string fileName)
{
  string connectionString = $"yourconnectionstring";
 
  // Setup the connection to the storage account
  CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
   
  // Connect to the blob storage
  CloudBlobClient serviceClient = storageAccount.CreateCloudBlobClient();
  // Connect to the blob container
  CloudBlobContainer container = serviceClient.GetContainerReference($"{containerName}");
  // Connect to the blob file
  CloudBlockBlob blob = container.GetBlockBlobReference($"{fileName}");
  // Get the blob file as text
  string contents = blob.DownloadTextAsync().Result;
   
  return contents;
}

The usage is equally easy:

GetBlob(“containername”, “my/file.json”);

How to split string in SQL Server

There could be many method but here are two simple one;

DECLARE @ValueToSplit NVARCHAR(50) = N'12345VA [987]'

--This is classical method to split string
SELECT
	SUBSTRING(@ValueToSplit, 0, CHARINDEX('[', @ValueToSplit)) AS [FIRST],
	SUBSTRING(@ValueToSplit, CHARINDEX('[', @ValueToSplit)+1, LEN(@ValueToSplit)) AS [SECOND]

--This function can be used in SQL 2016 onward
SELECT split.*
FROM
(
	SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) RowNumber, value AS TextValue FROM STRING_SPLIT(@ValueToSplit,'[')
) split
WHERE 1=1
AND split.RowNumber = 1

Here are the results;

Microservice and Idempotent method

Idempotent method is the one that results in same behavior either in a single or mutiple calls. For example a delete method will delete a resource
and return 204 first time. on subsequent calls it will return 404. return codes has nothting to do with idempotent behavior but the fact is that
behavior of call will always be the same.

+———+——+————+
| Method | Safe | Idempotent |
+———+——+————+
| CONNECT | no | no |
| DELETE | no | yes |
| GET | yes | yes |
| HEAD | yes | yes |
| OPTIONS | yes | yes |
| POST | no | no |
| PUT | no | yes |
| TRACE | yes | yes |
+———+——+————+

Reference

https://stackoverflow.com/questions/630453/what-is-the-difference-between-post-and-put-in-http

An overview of Azure SQL Database and SQL Managed Instance security capabilities

This article outlines the basics of securing the data tier of an application using Azure SQL Database and Azure SQL Managed Instance. The security strategy described follows the layered defense-in-depth approach as shown in the picture below, and moves from the outside in:

You can read more about this here;

https://docs.microsoft.com/en-gb/azure/azure-sql/database/security-overview#transport-layer-security-tls-encryption-in-transit