I read an interesting piece from Kalen Delaney this week regarding internal changes in SQL that reduce the need for two step updates. For those that don’t know a two step update is where SQL deletes the row and re-inserts instead of updating the row directly. This is very common when the clustered key values are changed because the row has to physically move location. This method of updating is clearly slower.

Kalen’s post discusses the changes in which SQL previously forced a two step update when any column is changed on a table with a trigger. Since SQL 2005 this no longer happens because the trigger can use the version store to get the previous copy of the row rather than rely on logging it.

This got me thinking about the effects of CDC on a table. CDC requires the previous version of the row to provide details of the change, unlike triggers it cannot use the version store because it is outside of the transaction that initiated the update. It is in fact asynchronous. I have spent much time improving performance on clients’ systems that rely heavily on triggers to perform event based business logic and auditing. In a number of cases I have opted to move the processing to CDC based jobs.

So using Kalen’s example (which you will need to run through first) let’s take a look at the effect of CDC on table.

-- First drop the trigger as we are not testing this
DROP TRIGGER trg_update_objects
GO

-- clear the active log portion
CHECKPOINT
GO

-- this should return a nice clean log
SELECT * FROM fn_dblog(null, null); 
GO

-- Perform an update on a non-key column that should perform an in-place update
UPDATE objects SET parent_object_id = parent_object_id + 1 WHERE name = 'sysfiles1'; 
GO

-- Check the log. You should see LOP_MODIFY_ROW
SELECT * FROM fn_dblog(null, null); 
GO

-- Now enable CDC
EXEC sys.sp_cdc_enable_db
GO
EXECUTE sys.sp_cdc_enable_table
                @source_schema = N'dbo'
              , @source_name = N'objects'
              , @role_name = NULL;
GO

-- Wait a few seconds for the log reader to start and then clear the log file again
CHECKPOINT
GO

-- Perform the same update again.
UPDATE objects SET parent_object_id = parent_object_id + 1 WHERE name = 'sysfiles1'; 
GO

--  Now notice that we now see a LOP_DELETE_ROWS followed by LOP_INSERT_ROWS
SELECT * FROM fn_dblog(null, null); 
GO

From this we can see that updates that were previously performed in place now appear to be two step updates.

So enabling CDC increases the amount of logging that is required on every transaction relating to the enabled table. As SQL does not respond to the commit command until the transaction logging is complete/successful this does create an overhead to your application. This should not be confused with the expected additional workload of processing CDC data where you expect to have to process 2 records for updates.

I still prefer CDC over triggers by a million miles. Batching the processing and essentially being able to deal with it as a queue rather than a synchronous event is much better for performance but it is important to understand the impact of CDC. If your table has a large update batches occurring on it you could see performance issues.

Tagged with →  

Leave a Reply