Change data capture (CDC) records insert, update, and delete activity that applies to a SQL Server table. This makes the details of the changes available in an easily consumed relational format.
To enable CDC, run this;
-- COMMAND TO ENABLE CDC on the Database
EXEC sys.sp_cdc_enable_db
GO
Look under Database Name -> Tables -> System Tables
When a database is enabled for change data capture, the cdc schema, cdc user, metadata tables, and other system objects are created for the database. The cdc schema contains the change data capture metadata tables and, after source tables are enabled for change data capture, the individual change tables serve as a repository for change data. The cdc schema also contains associated system functions used to query for change data.
-- COMMAND TO DISABLE CDC on the Database
EXEC sys.sp_cdc_disable_db
GO
This will remove all CDC related objects.
How do I use it?
Our business requirements is to handle data changes in underlying relational database. My team members are mostly web developers and for them it’s no brainer. They can handle changes in code and write those changes to the tables. Yes, it’s doable but this will increase solution complexity and cost of the project. Budget is another limitation.
Microsoft SQL server offers CDC feature. Why not to leverage this feature and customize it to handle our situation.
Here is a prototype for implementing this feature. I am assuming that a database already exists with “FM” schema, all tables has primary key and SQL Server Agent is running.
Follow these steps;
We need to enable CDC on all tables. Run this script to enable CDC;
USE FM
go
if exists(select 1 from sys.databases where name ='FM' and is_cdc_enabled =1)
EXEC sys.sp_cdc_disable_db
go
-- COMMAND TO ENABLE CDC on the Database
EXEC sys.sp_cdc_enable_db
GO
DECLARE @source_schema sysname, @source_name sysname
DECLARE #finstance CURSOR LOCAL fast_forward
FOR
select table_name,TABLE_SCHEMA from information_schema.tables
where table_name in(
select distinct table_name from INFORMATION_SCHEMA.KEY_COLUMN_USAGE where constraint_name like 'PK%')
and table_schema ='FM' and TABLE_NAME not in ('sysdiagrams')
OPEN #finstance
FETCH #finstance INTO @source_name,@source_schema
WHILE (@@fetch_status <> -1)
BEGIN
print 'Schema is ' + @source_schema + ' , ' + 'Table Name is ' +@source_name
EXEC [sys].[sp_cdc_enable_table]
@source_schema
,@source_name
,@role_name = NULL
,@supports_net_changes = 1
FETCH #finstance INTO @source_name,@source_schema
END
CLOSE #finstance
DEALLOCATE #finstance
GO
To test, Make sure Project Table is empty. Let’s test CDC;
SELECT * FROM [FM].[Project]
SELECT * FROM [cdc].[FM_Project_CT]
UPDATE [FM].[Project]
SET
ProjectTitle = N'some other title',
ProjectDescription = N'someones novel...'
WHERE ID = N'E90F6FD5-CCCA-EB11-ACB9-8CC84B4006DC'
Here is the result;
Great. I can see the feature works fine.
The developer can use following query to get complete picture;
SELECT
src.id,
Operation = CASE trgt.[__$operation]
When 1 Then 'Delete'
WHEN 2 Then 'Insert'
WHEN 3 Then 'Update row before the change'
WHEN 4 Then 'Update row after the change'
ELSE 'Unknown'
END,
src.projectTitle sTitle, src.projectdescription sProjectDescription,
trgt.projectTitle tTitle, trgt.projectdescription tProjectDescription
FROM [FM].[Project] src
JOIN [cdc].[FM_Project_CT] trgt on src.id = trgt.id
Here are the results;
A little bit explanation about tracking table;
When the feature is enabled on a table, the change table named cdc.<captured_instance>_CT is automatically created in the tracked database. The table contains a row for each insert and delete on the source table, and two rows for each update. The first one is identical to the row before the update, and the second one to the row after the update. To query the table, use the cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes functions
The first five columns contain the metadata necessary for the feature, the rest are the exact replica of the source table __$start_lsn – the Log Sequence Number of the commited transaction. Every change committed in the same transaction has its own row in the change table, but the same __$start_lsn __$end_lsn – the column is always NULL in SQL Server 2012, future compatibility is not guarantee __$seqval – the sequence value used to order the row changes within a transaction __$operation – indicates the change type made on the row
Delete
Insert
Updated row before the change
Updated row after the change
__$update_mask – similar to the update mask available in Change Tracking, a bit mask used to identify the ordinals of the modified columns
How do I customize this feature to work as per my requirements?
DDL changes (New tables, column changes) affect this process. Disabling CDC will remove all cdc generated tables and data. Enable CDC will recreate all cdc tables. I can not risk loosing data!
One method to solve is to create another database, say FMAudit. Copy all cdc generated tables to that database (use Generate schema feature of SQL Server). Create a schedule process (use SSIS or windows service) that will copy and truncate data from source database. Use SQL Agent and Keep process schedule time to 10 seconds.
We can use Regular Expression to find and replace, valid with all versions of SSMS
Find what: {.+}
Replace with: ‘\1’,
Look in: Selection
Expand Find Option
Use: Regular expression (checked)
That regular expression indicates find everything and remember what we found Replace everything we found \1 by wrapping it with tic marks and a comma.
If you have more complex requirements, the right chevron next to the drop down arrow on Find what lists the regular expression dialect SSMS/Visual Studio understands