5 minute read

In my constant effort to make my job obsolete, I have created an “add-in” for windbg that automatically checks for hang conditions in a .NET memory dump.

Disclaimer: It is very crude and doesn’t have a lot in terms of exception handling so basically I don’t follow my own examples here:) but on the other hand, this is supposed to be used by one person at a time and if it crashes it won’t really hurt anyone so that is really why I omitted exception handling and code optimization.

This is written as a .NET console app that you run with the .shell command in windbg and pass in ~* kb 2000, !threads and ~* e !clrstack. The application then parses this output and look for a few conditions that are common for hangs. I have written it so that it should be easy to add checks for other conditions as well and the solution for the application is attached to this blog post.

BTW, if anyone knows how to emit DML in the output sent from a console app, please let me know… I have tried a thousand ways but I haven’t been able to do it.

Output

The output when running this on a dump that experienced a deadlock with a lot of different hang causes looks like this:

====================================================================================================================
                   Hang Analysis
====================================================================================================================

____________________________________________________________________________________________________________________

 GC RELATED INFORMATION
____________________________________________________________________________________________________________________

The following threads are GC threads:
    15 16

The following threads are waiting for the GC to finish:
    11 13 21 23 24 25 27 28 32 33 34 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 54 55 56 57 59 60 61 62 63 64 65 66 67 68 69 70 71 74 75

The GC was triggered by thread: 72

The GC is working on suspending threads to continue with garbage collection
The following threads can't be suspended because preemptive GC is disabled:
    20 22 30 31 35 53 58

The Finalizer (Thread 17) is not blocked


BLOG POSTS ABOUT GC RELATED ISSUES
===================================
GC-LoaderLock Deadlock       http://blogs.msdn.com/tess/archive/2007/03/12/net-hang-case-study-the-gc-loader-lock-deadlock-a-story-of-mixed-mode-dlls.aspx
High CPU in GC               http://blogs.msdn.com/tess/archive/2006/06/22/643309.aspx
High CPU in GC (Viewstate)   http://blogs.msdn.com/tess/archive/2006/11/24/asp-net-case-study-bad-perf-high-memory-usage-and-high-cpu-in-gc-death-by-viewstate.aspx
Blocked finalizer            http://blogs.msdn.com/tess/archive/2006/03/27/561715.aspx

____________________________________________________________________________________________________________________

 INFORMATION ABOUT LOCKS AND CRITICAL SECTIONS
____________________________________________________________________________________________________________________

The following threads are spinning waiting to enter a .NET Lock:
    20 22 30 31 35 53 58

The following threads are waiting for a critical section:	(TIP! Run !sieextPub.critlist to find out who the owner is)
    12 19

The following threads are waiting in a WaitMultiple:
    21 25


BLOG POSTS ABOUT LOCKS AND CRITICAL SECTIONS
==============================================
WaitOne and WebService calls  http://blogs.msdn.com/tess/archive/2006/02/23/537681.aspx
Locks and Critical sections   http://blogs.msdn.com/tess/archive/2006/01/09/510773.aspx

____________________________________________________________________________________________________________________

 INFORMATION ABOUT THREADS WAITING ON EXTERNAL PROCESSES
____________________________________________________________________________________________________________________

The following threads are waiting in a Socket.Receive:
    13 34 43 44 45 46 47 49 50 51 56 63 66 68 69


BLOG POSTS ABOUT DEBUGGING HANGS
==============================================
Hang debugging walk through   http://blogs.msdn.com/tess/archive/2006/10/16/net-hang-debugging-walkthrough.aspx
Things to ignore when you're debugging a hang   http://blogs.msdn.com/tess/archive/2007/04/02/things-to-ignore-when-debugging-an-asp-net-hang-update-for-net-2-0.aspx

In this particular case we can see that the GC is running since we have a lot of threads (pretty much all .net threads) waiting for the GC to finish. The GC however can not finish because it is waiting to suspend threads 20, 22, 30, 31, 35, 53 and 58, which can’t be suspended because they have preemptive GC disabled. This is in turn caused by these threads being stuck in a spin-loop trying to enter a .net Lock and for some reason they can’t exit that spin-loop. I might post more details about this case later but for the purposes of this post I am just showing the hanganalyzer.

In addition we can see that threads 13, 34, 43, 44, 45, 46, 47, 49, 50, 51, 56, 63, 66, 68 and 69 are in a socket.receive waiting for some results from a socket call. The observant reader will probably have noticed that these threads are also waiting for the GC to complete so it really won’t continue reading until the GC completes.

Armed with the info above you can go out and look at the stacks to get more detailed information about the threads, and I have also included a number of posts of interest for each section.

Writing an application for windbg

Writing an application to run from windbg is very easy.

The command for running an app from windbg looks like this

.shell -ci ["commands"] [application]

for example to find all the instances of the word Threadpool Worker in the !threads output using the DOS command FIND /C "word" you would do this

0:032> .shell -ci "!threads" FIND /C "Threadpool Worker"
47
.shell: Process exited

The FIND application will then receive the output of the commands you pass in after -ci and can read it using standard in (i.e. Console.ReadLine()), and whatever it prints out using standard out (i.e. Console.WriteLine()) gets displayed in the debugger.

So, a simple app that just reads the output of a windbg command and echoes it back to the debugger would then look something like this

static void Main(string[] args){
  string input = Console.ReadLine();
  while (input != null){
    Console.WriteLine(input);
    input = Console.ReadLine();
  }
}

And you would call it from windbg like this .shell -ci "~* kb" c:\MyApp

Hang analyzer implementation and usage

The HangAnalyzer takes the input ~* kb;.echo MANAGED THREADS;!threads;.echo MANAGED CALLSTACKS;~* e !clrstack (the echos for MANAGED THREADS and MANAGED CALLSTACKS are just there so i know when to stop parsing native call stacks and !threads output… this could easily be refined by better parsing but it serves the purpose)

.shell -ci "~* kb;.echo MANAGED THREADS;!threads;.echo MANAGED CALLSTACKS;~* e !clrstack;" F:\blog\HangAnalyzer\HangAnalyzer\bin\Debug\HangAnalyzer

Of course you would exchange the path for the location of your HangAnalyzer. However you could also set up a command script so that you don’t have to type in this path all the time, i.e. just copy this string into a text file and run the text file like this

$><c:\TOOLS\EXTENSIONS\ANALYZEHANGS.TXT

Note: Since the application will parse the kb and !clrstack output a few things are important

  1. You have to have symbols loaded or else it will not find the text (i.e. start off by running .symfix c:\pubsym and .reload to make sure you have symbols set, you may also want to run ~* kb just to check that the output looks normal). If you don’t have the correct symbols it may also falsely report that the finalizer is stuck since it cant find the call to WaitForFinalizerEvent
  2. You have to have sos loaded so that !clrstack will run, otherwise you will miss info derived from !threads and !clrstack output.

This is not an exhaustive list of hang reasons so even if this reports nothing that doesn’t necessarily mean that there isn’t a hang/performance issue.

In its current version it works for .NET 2.0 and up but you can easily change it yourself to work for 1.1. or anything else by changing the parsing.

Download the solution

You can find the source code at https://github.com/TessFerrandez/HangAnalyzer