10 minute read

The situation here is that the server at times slows down considerably, or even hangs completely. Requests start timing out and no new requests seem to be getting through. The scenario below is a bit simplified but it shows the technique to find owners, waiters and locations for locks.

A quick look with ~* kb (listing the native stack for all threads) shows us that most threads are sitting in this type of stack…

  30  Id: d78.c68 Suspend: 1 Teb: 7ff5a000 Unfrozen
ChildEBP RetAddr  Args to Child
0c6becb4 7c822114 77e6711b 00000001 0c6bed04 ntdll!KiFastSystemCallRet
0c6becb8 77e6711b 00000001 0c6bed04 00000000 ntdll!NtWaitForMultipleObjects+0xc
0c6bed60 791e0b3b 00000001 00134f7c 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0c6bed90 791e0bdd 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0c6bede4 791fccfe 00000001 00134f7c 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0c6bee68 791fcc17 01fcc738 ffffffff 00000000 mscorsvr!AwareLock::EnterEpilog+0x9d
0c6bee84 791fd43e 0c6bef7c 064aecf4 064ad210 mscorsvr!AwareLock::Enter+0x78
0c6bef18 0c1c0744 064ad210 064aecf4 0c6bef7c mscorsvr!JITutil_MonContention+0x124
WARNING: Frame IP not in any known module. Following frames may be wrong.
0c6bef40 0c1c0697 064aeca0 064ae094 064ae094 0xc1c0744
0c6bef84 0c525abd 064aeca0 064ad210 064ae094 0xc1c0697
0c6bf0cc 799a67a2 064ac400 064ac704 799b4168 0xc525abd
0c6bf0d8 799b4168 00000002 00000000 0210177c mscorlib_79990000+0x167a2
0c6bf160 79217188 00000000 00da9a60 061bc490 mscorlib_79990000+0x24168
0c6bf1d4 7923c206 01eb2f60 0c6bf2dc 0015d990 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
0c6bf214 791cc3c8 0c6bf284 791b3d64 0c6bf300 mscorsvr!ComCallMLStubCache::CompileMLStub+0x1af
0c6bf25c 7923c1c3 0015d990 7923c1ce 0c6bf284 mscorsvr!Thread::DoADCallBack+0x5c
0c6bf2c4 01afa0bd 01fcc738 0c6bf300 0c6bffdc mscorsvr!ComCallMLStubCache::CompileMLStub+0x2ba
0c6bf3b8 791cc310 000de250 00000000 791cc32b 0x1afa0bd
0c6bf41c 791b3e6f 791cc131 0010f14c 00000000 mscorsvr!Thread::SafeSetStaticData+0x63
0c6bf4f8 7c822124 77e6baa8 000002e4 00000000 mscorsvr!Frame::Pop+0x9

So from managed code at the bottom of the stack, we go into something called JITutil_MonContention which then goes into AwareLock::Enter. This means that the thread is sitting in some type of synchronization statement, likely a lock(){} waiting for that lock to be free so we can enter the critical section.

There are two questions to answer here

  1. Where are we locking?
  2. Who owns the lock?

For the answer to the second question, “who owns the lock?” we can turn to !syncblk. !syncblk without any parameters will list SyncBlocks corresponding to objects owned by active threads.

0:018> !syncblk
Index SyncBlock MonitorHeld Recursion   Thread  ThreadID     Object Waiting
  987 0x00134f68           19         1 0x10eb28 0xcc8  17   0x022841fc System.Object
       Waiting threads: 18 19 23 24 25 26 27 29 30
-----------------------------
Total           1039
ComCallWrapper  4
ComPlusWrapper  1
ComClassFactory 0
Free            12

MonitorHeld shows 19. The owner has 1 and then each waiter has 2, so this means we have one owner and 9 waiters. We are synchronizing on a System.Object (0x022841fc) and the owning thread is thread 17.

The waiting threads are 18, 19, 23, 24…

So thread 18 for example… looks like this

0:018> !clrstack
Thread 18
ESP         EIP
0x0c3ef490  0x7c82ed54 [FRAME: GCFrame]
0x0c3ef54c  0x7c82ed54 [FRAME: HelperMethodFrame]
0x0c3ef5a0  0x0c1c0744 [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x0c3ef5c8  0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x0c3ef610  0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x0c3ef624  0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x0c3ef634  0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x0c3ef63c  0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x0c3ef64c  0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0c3ef690  0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0c3ef6cc  0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0c3ef6d4  0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0c3ef6e4  0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0c3ef72c  0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0c3ef774  0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0c3ef790  0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0c3ef7cc  0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0c3ef7d8  0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0c3ef8a0  0x79217188 [FRAME: ContextTransitionFrame]
0x0c3ef980  0x79217188 [FRAME: ComMethodFrame]

Waiting in CacheUpdater.Update(), and if we disassemble we can see this…

0:018> !u 0x0c1c0744
Will print '>>> ' at address: 0x0c1c0744
Normal JIT generated code
[DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
Begin 0x0c1c0718, size 0x68
0c1c0718 55               push    ebp
0c1c0719 8bec             mov     ebp,esp
0c1c071b 83ec14           sub     esp,0x14
0c1c071e 57               push    edi
0c1c071f 56               push    esi
0c1c0720 53               push    ebx
0c1c0721 33c0             xor     eax,eax
0c1c0723 8945ec           mov     [ebp-0x14],eax
0c1c0726 c745f800000000   mov     dword ptr [ebp-0x8],0x0
0c1c072d c745ec00000000   mov     dword ptr [ebp-0x14],0x0
0c1c0734 a12c66170a       mov     eax,[0a17662c] (Object: System.Object)
0c1c0739 8945ec           mov     [ebp-0x14],eax
0c1c073c 8b4dec           mov     ecx,[ebp-0x14]
0c1c073f e8e712ff6c       call    mscorsvr!JIT_MonEnter (791b1a2b)
>>> 0c1c0744 b988130000       mov     ecx,0x1388
0c1c0749 e85da79f6d       call    mscorlib_79990000+0x22aeab (79bbaeab) (System.Threading.Thread.Sleep)
0c1c074e 90               nop
0c1c074f c745f400000000   mov     dword ptr [ebp-0xc],0x0
0c1c0756 c745f8fc000000   mov     dword ptr [ebp-0x8],0xfc
0c1c075d 686f071c0c       push    0xc1c076f
0c1c0762 eb00             jmp     0c1c0764
0c1c0764 8b4dec           mov     ecx,[ebp-0x14]
0c1c0767 e8e114ff6c       call    mscorsvr!JIT_MonExit (791b1c4d)
0c1c076c 58               pop     eax
0c1c076d ffe0             jmp     eax
0c1c076f c745f800000000   mov     dword ptr [ebp-0x8],0x0
0c1c0776 eb00             jmp     0c1c0778
0c1c0778 90               nop
0c1c0779 5b               pop     ebx
0c1c077a 5e               pop     esi
0c1c077b 5f               pop     edi
0c1c077c 8be5             mov     esp,ebp
0c1c077e 5d               pop     ebp
0c1c077f c3               ret

So we are sitting just before a call to Sleep(0x1388) or Sleep(5000) in a JIT_MonEnter call

In fact, the function looks like this… and we are stuck on the lock(_syncRoot) line

public static void Update()
{
  lock(_syncRoot)
  {
    System.Threading.Thread.Sleep(5000);
  }
}

And thread 17 (the lock owner) looks like this

0:017> !clrstack
Thread 17
ESP         EIP
0x009ff610  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x009ff620  0x0c1c074e [DEFAULT] Void SuperSlowWebApp.CacheUpdater.Update()
0x009ff648  0x0c1c0697 [DEFAULT] [hasThis] Void SuperSlowWebApp.MyPage.btnUpdateCache_Click(Object,Class System.EventArgs)
0x009ff690  0x0c525abd [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.OnClick(Class System.EventArgs)
0x009ff6a4  0x0c52589a [DEFAULT] [hasThis] Void System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(String)
0x009ff6b4  0x0c52584b [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Web.UI.IPostBackEventHandler,String)
0x009ff6bc  0x0c5257aa [DEFAULT] [hasThis] Void System.Web.UI.Page.RaisePostBackEvent(Class System.Collections.Specialized.NameValueCollection)
0x009ff6cc  0x0c4c6ead [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x009ff710  0x0c4c5937 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x009ff74c  0x0c4c53a3 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x009ff754  0x0c4c537c [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x009ff764  0x0c1da288 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x009ff7ac  0x0c1d9322 [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x009ff7f4  0x0c1d91eb [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x009ff810  0x01eb6897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x009ff84c  0x01eb6448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x009ff858  0x01eb2fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x009ff920  0x79217188 [FRAME: ContextTransitionFrame]
0x009ffa00  0x79217188 [FRAME: ComMethodFrame]

So in summary all the requests are waiting for thread 17 to come out of its sleep and release the lock

If we grab the System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6 from !dso (dumpstackobjects)

0:017> !dso
Thread 17
ESP/REG    Object     Name
…
0x009ff820 0x024961f4 System.Web.HttpContext
0x009ff824 0x02495ed8 System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
0x009ff828 0x061c3460 System.Web.HttpRuntime
0x009ff84c 0x061ab370 System.String    c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff850 0x02495ff0 System.String    c:\inetpub\wwwroot\SuperSlowWebApp\
0x009ff870 0x061bc490 System.Web.Hosting.ISAPIRuntime
0x009ff934 0x021a0770 System.Runtime.Remoting.Messaging.LogicalCallContext

And look at the _startTime

0:017> !do 0x02495ed8
Name: System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
MethodTable 0x02061fec
EEClass 0x01ee8460
Size 200(0xc8) bytes
GC Generation: 0
mdToken: 0x02000115  (c:\windows\assembly\gac\system.web\1.0.5000.0__b03f5f7f11d50a3a\system.web.dll)
FieldDesc*: 0x02061e78
        MT      Field     Offset                 Type       Attr      Value Name
0x01b28a80 0x4000798      0x4            VALUETYPE   instance start at 02495edc _startTime
0x01b28a80 0x4000799        0                CLASS     shared   static s_HTTPStatusDescriptions
    >> Domain:Value 0x000de250:NotInit  0x0010ee48:0x061c0e38 <<
0x01b28a80 0x400079a      0x4                CLASS     shared   static s_requestHeaderNames
    >> Domain:Value 0x000de250:NotInit  0x0010ee48:0x061c0f80 <<
0x01b28a80 0x400079b      0x8                CLASS     shared   static s_responseHeaderNames
    >> Domain:Value 0x000de250:NotInit  0x0010ee48:0x061c1030 <<
0x01b28a80 0x400079c      0xc                CLASS     shared   static s_requestHeadersLoookupTable
    >> Domain:Value 0x000de250:NotInit  0x0010ee48:0x061c10b8 <<
0x01b28a80 0x400079d     0x10                CLASS     shared   static s_responseHeadersLoookupTable
    >> Domain:Value 0x000de250:NotInit  0x0010ee48:0x061c1194 <<
0x01b2938c 0x400096f     0x5c         System.Int32   instance 14358560 _ecb
0x01b2938c 0x4000970     0x60         System.Int32   instance 0 _token

…we can use !convertvtdatetodate to find out when the request started …

0:017> !convertvtdatetodate 02495edc
01/09/2006 11:38:35

…and compare this to the current time given by .time

0:017> .time
Debug session time: Mon Jan  9 12:40:15.000 2006 (GMT+1)
System Uptime: 2 days 11:36:47.637
Process Uptime: 0 days 0:19:48.000
  Kernel time: 0 days 0:00:01.000
  User time: 0 days 0:00:00.000

Note: the time for the request is one hour off since it is given in GMT time, and I am located in the GMT+1 timezone

So we can see that this request has been executing for 1 minute and 40 seconds so far (11:38:35 -> 11:40:15), probably because it has been blocked waiting to get the lock in the first place.

You can use this method to find out how long any request has been executing in the same way, and if there is not an ISAPIWorkerRequestInProcForIIS6 on the stack, you can use the HttpContext._request._wr to get to it (by using !dumpobj)

As a side note, when you use .time in a live debug session it will give you the time now, even if you stopped the debugger a while back, so this technique for finding out how long a request has been executing works best in dumps.

A short note on Critical Sections

Similarly if you have threads stuck in a native synchronization block like ntdll!RtlEnterCriticalSection

0:021> kb
ChildEBP RetAddr  Args to Child
0210e7cc 7c822124 7c83970f 000009f8 00000000 ntdll!KiFastSystemCallRet
0210e7d0 7c83970f 000009f8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0210e80c 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c
0210e82c 4bfa4c16 027c0714 0277a5b4 4bf727af ntdll!RtlEnterCriticalSection+0xa8
0210e838 4bf727af 0277a5b4 0277a3a0 0210e868 odbc32!CCriticalSection::Enter+0xf
0210e848 4bf79ae6 0277a5b4 021c2d60 0210e9e8 odbc32!ODBCEnterCriticalSection+0xf
0210e868 4bf923bb 0277a3a0 00000001 000003ee odbc32!FreeStmt+0xe4
0210e884 026f0a35 00000003 4bfa0000 0210e8dc odbc32!SQLFreeHandle+0x1f3
WARNING: Frame IP not in any known module. Following frames may be wrong.
0210e928 791b33e4 00000000 0260aa80 0210e948 0x26f0a35
0210e938 03720601 0210e960 791b3208 0210eb14 mscorsvr!EEClass::GetFixedUpSlot+0x1d
0210e948 791f4512 0210e994 00000000 0210e96c 0x3720601
0210eb14 021824ee 0210eb20 10311208 00000001 mscorsvr!COMMember::InvokeMethod+0x95a
0210eb64 799dd5a2 00000001 00000000 00000000 0x21824ee
0210eba8 799dd3e0 00000001 00000000 1525e724 mscorlib_79990000+0x4d5a2
00000000 00000000 00000000 00000000 00000000 mscorlib_79990000+0x4d3e0

…you can use the first parameter to the RtlEnterCriticalSection function (0x027c0714) and pass it to the !critsec function to find out who owns the critical section

0:021> !critsec 027c0714

CritSec +27c0714 at 027C0714
WaiterWoken        No
LockCount          1
RecursionCount     1
OwningThread       1490
EntryCount         0
ContentionCount    4
*** Locked

In this case the owning thread is 1490 and you can find the logical thread number by using ~

0:021> ~
#  0  Id: 1748.558 Suspend: 1 Teb: bffdf000 Unfrozen
   1  Id: 1748.15d4 Suspend: 1 Teb: bffdd000 Unfrozen
   2  Id: 1748.1628 Suspend: 1 Teb: bffdc000 Unfrozen
   3  Id: 1748.1524 Suspend: 1 Teb: bffd8000 Unfrozen
   4  Id: 1748.17d0 Suspend: 1 Teb: bffd7000 Unfrozen
   5  Id: 1748.12b4 Suspend: 1 Teb: bffd6000 Unfrozen
   6  Id: 1748.17e8 Suspend: 1 Teb: bffd5000 Unfrozen
   7  Id: 1748.1220 Suspend: 1 Teb: bffd4000 Unfrozen
   8  Id: 1748.1148 Suspend: 1 Teb: bffaf000 Unfrozen
   9  Id: 1748.16a8 Suspend: 1 Teb: bffae000 Unfrozen
  10  Id: 1748.12e0 Suspend: 1 Teb: bffad000 Unfrozen
  11  Id: 1748.1278 Suspend: 1 Teb: bffac000 Unfrozen
  12  Id: 1748.1700 Suspend: 1 Teb: bffab000 Unfrozen
  13  Id: 1748.9f0 Suspend: 1 Teb: bffa8000 Unfrozen
  14  Id: 1748.1690 Suspend: 1 Teb: bffa7000 Unfrozen
  15  Id: 1748.11f8 Suspend: 1 Teb: bffa6000 Unfrozen
  16  Id: 1748.2c0 Suspend: 1 Teb: bffa5000 Unfrozen
  17  Id: 1748.134c Suspend: 1 Teb: bffa4000 Unfrozen
  18  Id: 1748.1680 Suspend: 1 Teb: bffa3000 Unfrozen
  19  Id: 1748.a7c Suspend: 1 Teb: bffa2000 Unfrozen
  20  Id: 1748.1490 Suspend: 1 Teb: bff9e000 Unfrozen
. 21  Id: 1748.ab0 Suspend: 1 Teb: bff9d000 Unfrozen
  22  Id: 1748.c70 Suspend: 1 Teb: bff9c000 Unfrozen
  23  Id: 1748.170c Suspend: 1 Teb: bff9b000 Unfrozen
  24  Id: 1748.1538 Suspend: 1 Teb: bff4a000 Unfrozen
  25  Id: 1748.1470 Suspend: 1 Teb: bff47000 Unfrozen
  26  Id: 1748.3c8 Suspend: 1 Teb: bff43000 Unfrozen
  27  Id: 1748.ce8 Suspend: 1 Teb: bffa9000 Unfrozen
  28  Id: 1748.16c4 Suspend: 1 Teb: bff99000 Unfrozen
  29  Id: 1748.11e4 Suspend: 1 Teb: bff9a000 Unfrozen
  30  Id: 1748.131c Suspend: 1 Teb: bffaa000 Unfrozen
  31  Id: 1748.12ac Suspend: 1 Teb: bffdb000 Unfrozen
  32  Id: 1748.bfc Suspend: 1 Teb: bffda000 Unfrozen
  33  Id: 1748.1238 Suspend: 1 Teb: bffde000 Unfrozen

Or go to it directly with ~~[1490]s

Normally this thread will be executing some code inside the critical section protected by this critsec, but in this particular case, thread 20/1490 is a regular worker thread waiting for work

0:020> kb 2000
ChildEBP RetAddr  Args to Child
020bfaf8 7c822124 77e6baa8 00000250 00000000 ntdll!KiFastSystemCallRet
020bfafc 77e6baa8 00000250 00000000 020bfb40 ntdll!NtWaitForSingleObject+0xc
020bfb6c 77e6ba12 00000250 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac
020bfb80 791d401f 00000250 00009c40 00000000 kernel32!WaitForSingleObject+0x12
020bfba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x3a
020bffb8 77e66063 0012c938 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
020bffec 00000000 791fda8b 0012c938 00000000 kernel32!BaseThreadStart+0x34

Since there is nothing on this thread that should be using a critical section in odbc32, most likely the critical section is either transitioning/moving to a different thread or orphaned, which means that it is lost forever because the thread that owned it got an access violation or similar while owning the critical section.

In a case like this there are two avenues. The first is to take two dumps after each other (with about 1-2 minutes in between) to see if the critical section is transitioning, if it is, we should see that the first thread is no longer waiting for the critical section in dump #2. The second avenue is to take crash dumps instead of hang dumps so that we record any access violations or similar that occur prior to the hang and can find out what caused us to orphan the critical section.

Just so you don’t lay sleepless over this:) I will tell you the cause of this particular orphaning… the dump itself was about 1.4 GB so the process had suffered a number of OutOfMemoryExceptions leaving threads in unstable states like this…

Happy hunting!