Deadlock graph/report experts wanted :)

  • TheSQLGuru (4/7/2016)


    Brandie Tarvin (4/7/2016)


    Grant Fritchey (4/7/2016)


    Since you're on SQL Server 2012, have you tried using extended events to capture query metrics? They're not free (nothing is, TANSTAAFL always applies), but they are much lower cost than trace events and, because they have much better filtering mechanisms, you can target specific databases and queries much more easily, further reducing their overhead on the system. It might help.

    Article or article link? @=)

    The lower overhead I can agree on, although honestly I don't think I have ever seen a definitive benchmark done that truly documents the overhead of each.

    Filtering I will have to disagree on, at least for the given examples. Both database_id and query text are easy filters in a trace.

    It has to do with where the filter is applied. Trace captures the full event, then filters out what it keeps, whereas exevents filters prior to the full capture. That's the biggest difference.

    I have seen performance measures done. I don't have one in front of me. I'm pretty sure Jonathan Kehayias has one somewhere.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Brandie Tarvin (4/7/2016)


    Grant Fritchey (4/7/2016)


    Since you're on SQL Server 2012, have you tried using extended events to capture query metrics? They're not free (nothing is, TANSTAAFL always applies), but they are much lower cost than trace events and, because they have much better filtering mechanisms, you can target specific databases and queries much more easily, further reducing their overhead on the system. It might help.

    Article or article link? @=)

    Everything Jonathan Kehayias has written for the last three years or so.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Hi all, just a little update.

    by using SQL profiler I managed to see exactly what the 2 transactions involved in the deadlock were doing. One is a user sign in transaction, and the other is the cleanup transaction. so they are well identifiable.

    The victim, running under serializable, was doing the UPDATE from the Deadlock Graph and only that statement.

    The survivor, running under snapshot was doing a couple of SELECTS and as last statement of the transaction the UPDATE WITH(REPEATABLEREAD) that was shown in the deadlock graph. Since the survivor is running its SELECTS under snapshot, there's nothing causing any blocking from those statements. So pretty much back to square one.

    Using the profiler - as expected - the system slowed down a lot, and I never managed to do generate a deadlock (which is already an edge case requiring a fast system with a lot of concurrency.

    So I created an extended event trace (code below for those interested), by which I managed to create the deadlock by pulling the plug and capture all statements running at time of the deadlock. I had the XE trace up and running for about 90 seconds which resulted in 2.5 GB of trace files. I've dumped those in a results table as per the script below (took about 17 minutes) for further analysis tomorrow. Hoping the key to reproducing it in SSMS is hidden somewhere in there.

    Thanks again to all for thinking with me and would appreciate any feedback that might help by analysing the results to find what is the root cause of the deadlock. We already know it can't just be reproduced by simply running the identified Victim and Survivor from the deadlock graph.

    IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='test_trace')

    DROP EVENT SESSION [test_trace] ON SERVER;

    CREATE EVENT SESSION [test_trace]

    ON SERVER

    ADD EVENT sqlserver.sql_statement_completed(

    ACTION ( sqlserver.session_id, sqlserver.sql_text)

    )

    ,

    ADD EVENT sqlserver.sp_statement_completed(

    ACTION (sqlserver.session_id, sqlserver.sql_text)

    )

    ADD TARGET package0.asynchronous_file_target

    (set filename = 'c:\temp\test_trace.xel' , metadatafile = 'c:\temp\test_trace.xem')

    ALTER EVENT SESSION [test_trace] ON SERVER STATE = START

    /*

    -- Results Table

    CREATE TABLE [dbo].[TRACE_RESULT](

    [TS1] [varchar](50) NULL,

    [TS2] [datetimeoffset](7) NULL,

    [SQL_TEXT] [varchar](600) NULL,

    [SQL_SPID] [int] NULL

    ) ON [PRIMARY]

    GO

    */

    INSERT INTO TRACE_RESULT

    SELECT

    xevents.value('(/event/@timestamp)[1]','VARCHAR(50)') AS TS1,

    xevents.value('(/event/@timestamp)[1]','DATETIMEOFFSET') AS TS2,

    xevents.value('(event/action[2])[1]','VARCHAR(512)') AS sql_text,

    xevents.value('(event/action[1])[1]','INT') AS SPID

    FROM

    (

    -- below reutrns the entire XML document

    SELECT CONVERT (XML, event_data) AS xevents

    FROM sys.fn_xe_file_target_read_file ('C:\Temp\test_trace*.xel',

    'C:\Temp\test_trace*.xem', NULL, NULL)

    ) T

    ALTER EVENT SESSION [test_trace] ON SERVER STATE = STOP

    I've only just started looking at the results table and am already faced with something strange:

    Looking for sql_text like '%serializable%' , 2 records are returned, both with different Ts1 and Ts2. But with the same SPID

    When i run the select * from trace_results where spid = spid identified from running under serializable. I only find the 2nd record, no trace of the first occurence. Odd!

  • would have been helpful to get some info from sys.dm_tran_locks

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks Perry, but correct me if I'm wrong but that would only help if I was "witnessing" the actual blocking/deadlocking, right?

  • BREAKTHROUGH! 🙂

    Starting with a clear mind today, I took paper and pencil to hand and plotted out the timeline from the deadlock report.

    Looking at the timestamps in there (lasttranstarted, lastbatchcompleted, lastbatchstarted) , gave me a clue how to reproduce: An update - when it comes down to it - really is 2 operations: 1) the select (what to update) and 2) the actual update itself.

    So I split up the UPDATE statement in the SELECT and UPDATE over 2 lines. Both using the WITH(REPEATABLE) read query hint.

    And lo and behold:

    1. Starting T1 under snapshot, running T1.1: reading the record (by ID) with repeatable read (so the SCAN happens under repeatable read isolation)

    2. Starting T2 under serializable, running the (to be victimised) UPDATE statement.

    3. Running T1.2: Updating the record (by ID), used in T1.1 ==> deadlocks and kills T2!

    Amazing and beautiful in its simplicity :), I didn't even need the extended events, statements at the time etc. all the info was right in front of me.

    So now that I have the exact step-by-step to reproduce this scenario, time to sit down and really get my head around it (feel free to offer your own interpretation if you have any) so I can recognize and avoid these going forward.

    Thanks again for your input!

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply