SQL Server Change Data Capture

Change data capture (CDC) records insert, update, and delete activity that applies to a SQL Server table. This makes the details of the changes available in an easily consumed relational format.

To enable CDC, run this;

-- COMMAND TO ENABLE CDC on the Database
EXEC sys.sp_cdc_enable_db
GO

Look under Database Name -> Tables -> System Tables

When a database is enabled for change data capture, the cdc schema, cdc user, metadata tables, and other system objects are created for the database. The cdc schema contains the change data capture metadata tables and, after source tables are enabled for change data capture, the individual change tables serve as a repository for change data. The cdc schema also contains associated system functions used to query for change data.

-- COMMAND TO DISABLE CDC on the Database
EXEC sys.sp_cdc_disable_db
GO

This will remove all CDC related objects.

How do I use it?

Our business requirements is to handle data changes in underlying relational database. My team members are mostly web developers and for them it’s no brainer. They can handle changes in code and write those changes to the tables. Yes, it’s doable but this will increase solution complexity and cost of the project. Budget is another limitation.

Microsoft SQL server offers CDC feature. Why not to leverage this feature and customize it to handle our situation.

Here is a prototype for implementing this feature. I am assuming that a database already exists with “FM” schema, all tables has primary key and SQL Server Agent is running.

Follow these steps;

We need to enable CDC on all tables. Run this script to enable CDC;

USE FM
go
if exists(select 1 from sys.databases where name ='FM' and is_cdc_enabled =1)
EXEC sys.sp_cdc_disable_db

go
-- COMMAND TO ENABLE CDC on the Database
EXEC sys.sp_cdc_enable_db
GO

DECLARE @source_schema sysname, @source_name sysname

DECLARE #finstance CURSOR LOCAL fast_forward
FOR
	select table_name,TABLE_SCHEMA from information_schema.tables 
	where table_name in(
				select distinct table_name from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where constraint_name like 'PK%')
				and table_schema ='FM'  and TABLE_NAME not in ('sysdiagrams')

OPEN #finstance
FETCH #finstance INTO @source_name,@source_schema
	
WHILE (@@fetch_status <> -1)
BEGIN
    print 'Schema is ' + @source_schema + ' , ' +  'Table Name is ' +@source_name 

	EXEC [sys].[sp_cdc_enable_table]
		@source_schema
		,@source_name
		,@role_name = NULL
		,@supports_net_changes = 1
			
	FETCH #finstance INTO @source_name,@source_schema
END
	
CLOSE #finstance
DEALLOCATE #finstance
GO

To test, Make sure Project Table is empty. Let’s test CDC;

SELECT * FROM [FM].[Project]
SELECT * FROM [cdc].[FM_Project_CT]

Add some records in Project Table;

INSERT [FM].[Project]
(ProjectTitle, ProjectStartDate, ProjectDescription)
VALUES
('kalpesh title', '6/1/2021', 'kalpesh big test')

Here is the result;

Make changes to the table;

UPDATE [FM].[Project]
SET 
	ProjectTitle = N'some other title',
	ProjectDescription = N'someones novel...'
WHERE ID = N'E90F6FD5-CCCA-EB11-ACB9-8CC84B4006DC'

Here is the result;

Great. I can see the feature works fine.

The developer can use following query to get complete picture;

SELECT 
	src.id, 
	Operation = CASE trgt.[__$operation]
		When 1 Then 'Delete'
		WHEN 2 Then 'Insert'
		WHEN 3 Then 'Update row before the change'
		WHEN 4 Then 'Update row after the change'
		ELSE 'Unknown'
	END,
	src.projectTitle sTitle, src.projectdescription sProjectDescription,
	trgt.projectTitle tTitle, trgt.projectdescription tProjectDescription
FROM [FM].[Project] src
JOIN [cdc].[FM_Project_CT] trgt on src.id = trgt.id

Here are the results;

A little bit explanation about tracking table;

When the feature is enabled on a table, the change table named cdc.<captured_instance>_CT is automatically created in the tracked database. The table contains a row for each insert and delete on the source table, and two rows for each update. The first one is identical to the row before the update, and the second one to the row after the update. To query the table, use the cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions

The first five columns contain the metadata necessary for the feature, the rest are the exact replica of the source table
__$start_lsn – the Log Sequence Number of the commited transaction. Every change committed in the same transaction has its own row in the change table, but the same __$start_lsn
__$end_lsn – the column is always NULL in SQL Server 2012, future compatibility is not guarantee
__$seqval – the sequence value used to order the row changes within a transaction
__$operation – indicates the change type made on the row

  1. Delete
  2. Insert
  3. Updated row before the change
  4. Updated row after the change

__$update_mask – similar to the update mask available in Change Tracking, a bit mask used to identify the ordinals of the modified columns

How do I customize this feature to work as per my requirements?

DDL changes (New tables, column changes) affect this process. Disabling CDC will remove all cdc generated tables and data. Enable CDC will recreate all cdc tables. I can not risk loosing data!

One method to solve is to create another database, say FMAudit. Copy all cdc generated tables to that database (use Generate schema feature of SQL Server). Create a schedule process (use SSIS or windows service) that will copy and truncate data from source database. Use SQL Agent and Keep process schedule time to 10 seconds.

Stay tuned for my next article about this method.

Resource

https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server?view=sql-server-ver15

FIND AND REPLACE using Regular Expression in SSMS

We can use Regular Expression to find and replace, valid with all versions of SSMS

  • Find what: {.+}
  • Replace with: ‘\1’,
  • Look in: Selection
  • Expand Find Option
  • Use: Regular expression (checked)

That regular expression indicates find everything and remember what we found Replace everything we found \1 by wrapping it with tic marks and a comma.

If you have more complex requirements, the right chevron next to the drop down arrow on Find what lists the regular expression dialect SSMS/Visual Studio understands

References

https://dba.stackexchange.com/questions/96371/applying-quotes-across-multiple-lines

Updating SSIS packages for specific SQL server version

Updating SSIS packages for a specific SQL server target version (valid only for SQL Server 2012 and above)

  1. Right click on a project, click properties and select your target server version from the drop down

Click Apply, read the warning then click Yes as shown below, and finally click Ok

Re-open the packages and notice how the development platform (control flow and data flow tabs) changes

Before

After

Check for any odd behavior (discontinued/deprecated tasks), build/rebuild your packages, test your packages, and deploy/redeploy them.

LEAD (Next) and LAG (Previous) Analytic function

SQL Server 2012 has two new analytical functions, LEAD() AND LAG(). These functions return data from Next row (LEAD) and Previous row (LAG) of the same dataset without using self-join.

Let’s go with an example;

This is my initial dataset;

Here is the query to create this dataset;

IF OBJECT_ID('tempdb..#SampleDataSet') is not null drop table #SampleDataSet

SELECT *
INTO #SampleDataSet
FROM
(
	SELECT 1 LineId, '2017-04-01' StockId, 'Toyota' AS Make, 'Brown' AS Color, 'Sale' AS ActionType, '2017-4-15' ActionDate UNION
	SELECT 2 LineId, '2017-04-01' StockId, 'Toyota' AS Make, 'Brown' AS Color, 'Quote' AS ActionType, '2017-4-10' ActionDate UNION
	SELECT 3 LineId, '2017-05-01' StockId,'Toyota' AS Make, 'Red' AS Color, 'Sale' AS ActionType, '2017-5-15' ActionDate UNION
	SELECT 4 LineId, '2017-05-01' StockId, 'Toyota' AS Make, 'Red' AS Color, NULL AS ActionType, '2017-5-10' ActionDate UNION
	SELECT 5 LineId, '2017-06-01' StockId, 'Toyota' AS Make, 'Pink' AS Color, 'Sale' AS ActionType, '2017-6-15' ActionDate UNION
	SELECT 6 LineId, '2017-06-01' StockId, 'Toyota' AS Make, 'Pink' AS Color, 'Quote' AS ActionType, '2017-6-10' ActionDate UNION
	SELECT 7 LineId, '2017-07-01' StockId, 'Toyota' AS Make, 'Green' AS Color, 'Sale' AS ActionType, '2017-7-15' ActionDate UNION
	SELECT 8 LineId, '2017-07-01' StockId, 'Toyota' AS Make, 'Green' AS Color, 'Quote' AS ActionType, '2017-7-10' ActionDate UNION
	SELECT 9 LineId, '2017-08-01' StockId, 'Honda' AS Make, 'Silver' AS Color, 'Sale' AS ActionType, '2017-8-15' ActionDate UNION
	SELECT 10 LineId, '2017-08-01' StockId, 'Honda' AS Make, 'Silver' AS Color, 'Quote' AS ActionType, '2017-8-10' ActionDate UNION
	SELECT 11 LineId, '2017-09-01' StockId, 'Honda' AS Make, 'Brown' AS Color, 'Sale' AS ActionType, '2017-9-15' ActionDate UNION
	SELECT 12 LineId, '2017-09-01' StockId, 'Honda' AS Make, 'Brown' AS Color, NULL AS ActionType, '2017-9-10' ActionDate UNION
	SELECT 13 LineId, '2017-10-01' StockId, 'Honda' AS Make, 'Red' AS Color, 'Sale' AS ActionType, '2017-10-15' ActionDate UNION
	SELECT 14 LineId, '2017-10-01' StockId, 'Honda' AS Make, 'Red' AS Color, 'Quote' AS ActionType, '2017-10-10' ActionDate
) src
WHERE 1=1

/*
	This is my initial dataset
*/

SELECT * FROM #SampleDataSet src

I am adding SeqCount and Seq column to identify each group of car (make and color) and each action in the group. LAG and Lead values of LineId column are used to demonstrate function output.

SELECT 
	src.LineId, src.StockId, 
	ROW_NUMBER() OVER (PARTITION BY StockId ORDER BY ActionDate DESC) Seq, 
	COUNT(*) OVER (PARTITION BY StockId) SeqCount, 
	src.Make, src.Color, src.ActionType, src.ActionDate,
	LEAD(src.LineId) OVER (ORDER BY src.LineId) LeadValue,
	LAG(src.LineId) OVER (ORDER BY src.LineId) LagValue
FROM #SampleDataSet src

This is the output of above query;

You can see the application of LEAD() and LAG() function in last two columns on the right side of above dataset.

One of the possible application of this function. As a business rule every quote action must be preceded by sale action in a car dealer dataset. If quote action is null in a group, fix it.

SELECT 
	src.StockId, src.Seq, src.SeqCount, src.Make, src.Color, src.ActionType,
	CASE WHEN src.LeadValue = 'Sale' AND src.ActionType IS NULL THEN 'Quote' ELSE src.ActionType END dActionType,
	src.ActionDate, src.LeadValue, src.LagValue
FROM
(
	--Dataset transformation
	SELECT 
		src.StockId, 
		ROW_NUMBER() OVER (PARTITION BY StockId ORDER BY ActionDate DESC) Seq, 
		COUNT(*) OVER (PARTITION BY StockId) SeqCount, 
		src.Make, src.Color, src.ActionType, src.ActionDate,
		LEAD(ActionType) OVER (ORDER BY ActionDate DESC) LeadValue,
		LAG(ActionType) OVER (ORDER BY ActionDate DESC) LagValue
	FROM #SampleDataSet src
) src
WHERE 1=1

Here is the output;

dActionType is a derived column that has all the correct sequence of actions with the help of Lead() and LAG() function.

Resources

https://docs.microsoft.com/en-us/sql/t-sql/functions/lead-transact-sql?view=sql-server-2017