Replaying deadlocks

  • I've been trying to assist some application developers with reducing deadlocks; for the most part (across hundreds of servers and applications) deadlocks are fairly rare, and where they do occur they happen in the same applications and the same places across servers (incidentally they are often not the "two tables accessed out of order" problem but "a clustered and non-clustered index on the same table becoming deadlocked" problem which is far less discussed and appears trickier to resolve).

    But this question is about the overall methodology for resolving them.

    What I've been doing is taking backups and capturing replay traces during periods of high activity, where I can see that deadlocks occurred during the trace. What I've been able to do from that is restore the backup and replay the traces elsewhere on a sandbox server. If you just do a normal replay in Profiler no deadlocks occur (likely because it's all single threaded). If you select the checkbox to have Profiler replay using multiple threads though, deadlocks do occur and the queries and objects being deadlocked are the same type as in production - though the frequency is different and a little unpredictable.

    I suspect that the replay goes as fast as it can and causes a much higher load than the real thing, and so can bring out deadlocks that weren't seen in the live environment, but which are based on the same underlying design faults.

    Anyway my co-worker and I were discussing my assumption of the validity of the tuning process; e.g. my replay can reproduce deadlocks of this kind, and so by refactoring the indexes and queries it should be possible to eliminate those deadlocks caused by the replay, and that I can then pass that onto the developers to trial and promote through the real environments and they deadlocks will likely go away.

    What do you think?

  • While that could work, it's possible you could end up fixing deadlocks that you won't get in prod. Unless you have a lot of spare time, maybe rather look at the deadlocks the prod server is experiencing (you can get the deadock graphs out of the default trace) and focus on those. You can always come back and look at the rest at a later time.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The deadlocks being reproduced in replay appear to be the same as the ones from production. So I suspect in that case then it's okay to proceed.

    The issue with only analysing production is that I need to validate my suggestions in some way before passing them on; hence the replay. The actual changes will take a long time for developers to test and migrate between all of their environments before reaching production ??

  • Double-check the deadlock graph you get from the replay against ones from prod, it's a rather large waste of time to fix stuff based on dev and have no effect on production. 🙂

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm with Gail. It's very likely that your process is working, but capture the deadlock graphs from production just to be sure that you're dealing with the same objects. Then your tuning efforts will pay off.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Yes the graphs are definitely the same. I mean it's not the same row/page but they're all on the same table, same queries, same combination of indexes.

  • Cody K (2/5/2015)


    Yes the graphs are definitely the same. I mean it's not the same row/page but they're all on the same table, same queries, same combination of indexes.

    Perfect. Then you're good to go.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply