Earlier this week I got an email from a reader who had a hang in an application and sent in some Debug Diag logs.
In order for me to dig in deeper I would need to open a support case as I would have to ask follow up questions and look at the dumps etc. However, I wanted to show some interesting details from his report to show you how to move forward if you get an issue like that.
Analyzing the debug diag output
At the top of the report generated by debug diag for the dump, you will see an analysis summary section.
In this section Debug Diag will report any issues that it found in the dump. This is based on a set of common issues that debug diag looks for.
Note: Debug Diag is so not looking for any managed (.net) issues in the standard hang/crash analysis scripts, so what you will see here is mostly if there are any native causes for the hang/crash, such as locked critical sections, waiting on socket calls etc.
|Warning||The following threads in DumpName.dmp are waiting on data to be returned from another server via WinSock.
The call to WinSock originated from 0x008fa1c3 and is destined for port 50001 at IP address 127.0.0.1
( 18 )
0.65% of threads blocked
|Ensure that any remote server this application may be calling is functioning properly and there are no network issues between the two servers. If the problem continues, please contact the application vendor for further assistance|
In this particular case we have one thread (thread 18) that is waiting to receive some data on a socket call to localhost (127.0.0.1) on port 50001. This could be anything from a web service call, to a database call, to a call to any type of service on the local box. You would have to look at the stack for thread 18 to get more information, or look at services listening on port 50001 in this case.
Drilling down to the data for thread 18 in the report we get
Thread 18 - System ID 10972 Entry point mscorwks!Thread::intermediateThreadProc Create time 1/7/2009 11:01:53 PM Time spent in user mode 0 Days 00:12:01.546 Time spent in kernel mode 0 Days 00:00:01.937 This thread is waiting on data to be returned from another server via WinSock. The call to WinSock originated from 0x008fa1c3 and is destined for port 50001 at IP address 127.0.0.1 Function Source ntdll!KiFastSystemCallRet ntdll!NtWaitForSingleObject+c mswsock!SockWaitForSingleObject+19d mswsock!WSPRecv+203 ws2_32!recv+83 0x008fa1c3 System_ni+1bf523 System_ni+1bf523 System_ni+1bf401 Socket properties: Source Port 3656 Destination IP 127.0.0.1 Destination Port 50001
From this we don’t get a whole lot of more information since the frames below the actual socket call are listed as System_ni… which means that it is .net code. To get any more info about this you would have to open the dump in windbg, load sos.dll (
.loadby sos mscorwks), move to thread 18 (
~18s) and run
!clrstack to get the .net stack and see what is doing the socket call.
Other interesting points in the debug diag output
Apart from the thread making the socket call, there were also some other interesting call stacks in the debug diag output
Over 60 threads were sitting in the following call stack
Thread 59 - System ID 10048 Entry point mscorwks!Thread::intermediateThreadProc Create time 1/7/2009 11:55:17 PM Time spent in user mode 0 Days 00:00:00.015 Time spent in kernel mode 0 Days 00:00:00.00 Function Source ntdll!KiFastSystemCallRet ntdll!NtWaitForMultipleObjects+c kernel32!WaitForMultipleObjectsEx+11a mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+6f mscorwks!Thread::DoAppropriateAptStateWait+3c mscorwks!Thread::DoAppropriateWaitWorker+13c mscorwks!Thread::DoAppropriateWait+40 mscorwks!CLREvent::WaitEx+f7 mscorwks!CLREvent::Wait+17 mscorwks!AwareLock::EnterEpilog+8c mscorwks!AwareLock::Enter+61 mscorwks!AwareLock::Contention+199 mscorwks!JITutil_MonContention+a3 mscorlib_ni+3fc3b1 mscorlib_ni+3fc32b mscorlib_ni+406d71
Again here we have the mscorlib_ni calls, which means that it there is actually .net code at the bottom of this stack, but since debug diag doesn’t resolve .net calls since it is using native debugger APIs they show up as mscorlib_ni or System_ni calls.
A little further up we see
mscorwks!AwareLock::Enter, which means that this thread is waiting for a .net lock, (i.e. a
So what we can tell from this is that we have a lot of contention in the process because all the threads are waiting for a lock and the thread holding the lock has probably held it for some time.
See Lab1 for more info about troubleshooting .net locks.
Chances are pretty big that they are all waiting for a lock held by thread 18, who is waiting for the socket call, but we really can’t tell for sure without looking at the stacks and the lock itself in windbg.
In order to figure out what is going on here and stalling the process, the best way to go about it would be to open the dump in windbg and load sos. Then run
~* e !clrstack to look at the .net stacks for all threads, specifically for the socket thread and the threads waiting for the lock to understand what lock we are waiting for and why.
Then to find out the owner of the lock we can run !syncblk, and focus on the thread that owns the lock to figure out why we are stuck or what could be done to reduce the contention on the lock.
When you look at a .net hang or .net crash dump, and analyze it with debug diag, you need to be aware that debug diag currently mostly looks for common native issues, so it is often necessary to make a second pass with windbg to get to the bottom of the issue.
Debug diag will spit out the following info about the thread in the cases where it can’t properly resolve the call stack
“This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.”
which is the case with .net stacks. So if you see something like that, that is a clue that either it is a .net stack, and you should be looking at it in windbg, or it is a stack where there are custom components and the symbols for those were not available.