What is a deadlock?

(from the first entry when running the google query “define:deadlock“)
“This is an inter-blocking that occurs when two processes want to access at shared variables mutually locked. For example, let A and B two locks and P1 and P2 two processes: P1: lock A P2: lock B P1: lock B (so P1 is blocked by P2) P2: lock A (so P2 is blocked by P1) Process P1 is blocked because it is waiting for the unlocking of B variable by P2. However P2 also needs the A variable to finish its computation and free B. So we have a deadlock”

What are the sympotms of a deadlock?
Usually you will see that the application stops responding for some reason and the CPU (if you’ll open up the Task Manager) is at 0% most of the time (if this is the only major application running on this machien).

What to do?
So… how can we find a deadlock, specifically in a production environment without a debugger or development envrionment? (just the kind of problem we like 🙂 )

What we want to find out is who is holding the locks and who is waiting on them.

Below are a few easy steps to figure this out.

  1. Attach WinDbg to the relevant process (F6 and select the process).
  2. make sure the symbols paths are OK by calling .sympath. If you don’t have the symbols for .NET and the rest of windows just type “.symfix c:\symbols” this will put a path to Microsoft’s public symbols server and will download all relevant symbols into c:\symbols. If you have symbols of your own in some folder you can use .sympath with the += flag (like this .sympath += c:\mypath) to add the additional path.
  3. type “.load clr10\sos.dll” to load the SOS extension (it might already be loaded, but it won’t do any harm calling it again).
  4. Run the command “!SyncBlk.

You can see a sample output here.

As you can see, we have two locks, since we have 2 resources that are locked. Each resource is locked by a different thread and each one of these thread is trying to acquire a lock on the other resource.

In the “ThreadID” column we can see the ID (and near it its Index in the current debug session) of the thread that is the owner of this lock. Under each lock we can see the list the indices of the waiting threads.

Now all we have left is to use the “!clrstack command on each of the waiting threads and the locking threads to see where are they in the code and determine why is the deadlock happening and figure out a way to avoid this situation.

To run the !clrstack command on a specific thread just use the following command “~3e!clrstack“. This command displays the CLR stack of the thread whose index is 3.

You can see a sample output of !clrstack (specifically ~4e!clrstack) here

While debugging this I had the debug symbols available, that is why WinDbg was able to give me an exact line number of where this is happening. If you don’t have the debug symbols of your code you will only see the function name which is good enough in most cases.

Can’t do a live debug? The problem is at a remote customer’s site?

If you are not able to live debug this situation because this happens at a customer site and you cannot get there or unable to have direct access to the machine, you can perform all the steps above on a dump taken using the “adplus.vbs” script.

Just tell the customer to download WinDbg from here and tell him to reproduce the problem. When he does reproduce it tell him to run the following command line:

adplus.vbs -p [process Id] -hang

(Replace [process Id] with the process Id of the application).

And tell him to send the dump back to you. You will be able to run the same commands as above and figure out where the deadlock happens.

Download sample code used in this post.

Happy Deadlock Hunt!