5 minute read

The other day I came across an issue where an ASP.NET site stopped responding and didn’t start serving requests again until the W3WP.exe process was restarted.

We grabbed some memory dumps with debug diag before restarting the process to see what was going on.

Debugging the dumps

I opened the dump in windbg, and loaded up sos (.loadby sos mscorwks).

As with all hang dumps I usually check what the threads are doing (~* kb) and I found that one thread (10) was initiating a garbage collection.

0:010> kb
ChildEBP RetAddr  Args to Child
01cceeb4 7c827d0b 77e61d1e 00000280 00000000 ntdll!KiFastSystemCallRet
01cceeb8 77e61d1e 00000280 00000000 01cceefc ntdll!NtWaitForSingleObject+0xc
01ccef28 79e718fd 00000280 000000fa 00000000 kernel32!WaitForSingleObjectEx+0xac
01ccef6c 79e718c6 00000280 000000fa 00000000 mscorwks!PEImage::LoadImage+0x199
01ccefbc 79e7187c 000000fa 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01ccefcc 7a0e288e 000000fa 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01ccf020 79f25226 00000000 9985f121 00032ca8 mscorwks!Thread::SysSuspendForGC+0x52a
01ccf10c 79f25011 00000001 00000024 00000000 mscorwks!WKS::GCHeap::SuspendEE+0x16c
01ccf130 79f24ef2 00000000 00000000 00000024 mscorwks!WKS::GCHeap::GarbageCollectGeneration+0x25b
01ccf1b8 79e74f4f 7a3867e8 00000024 00000000 mscorwks!WKS::gc_heap::try_allocate_more_space+0x1a0
01ccf1d4 79e74d29 7a3867e8 00000024 00000000 mscorwks!WKS::gc_heap::allocate_more_space+0x18
01ccf1f4 79e74cc2 00000022 00000000 00000009 mscorwks!WKS::GCHeap::Alloc+0x6c
01ccf208 79ea3646 00000022 00000000 00000000 mscorwks!Alloc+0x72
01ccf244 79f13708 00000009 9985f2c5 00000000 mscorwks!SlowAllocateString+0x29
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
01ccf2e8 79342fb3 00000048 00000000 00000000 mscorwks!FramedAllocateString+0xa0
01ccf308 79342f5a 00000000 00000008 00000006 mscorlib_ni+0x282fb3
00000000 00000000 00000000 00000000 00000000 mscorlib_ni+0x282f5a

This is really not all that uncommon since GCs happen quite frequently. What was interesting about this one though was that it was sitting in mscorwks!Thread::SysSuspendForGC which basically means that it is trying to suspend all the .net threads in order to get started with the garbage collection.

Whenever you see a thread stuck in this, you are almost always running into some GC deadlock condition.

To find out why we can threads can’t be suspended we can look at the !threads output to find out if any threads (other than the GC initiator) has preemptive GC disabled. (This means that it can’t be suspended)

0:000> !threads
ThreadCount: 12
UnstartedThread: 0
BackgroundThread: 11
PendingThread: 0
DeadThread: 1
Hosted Runtime: yes
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  10    1 5e5c 0010e5a0   1808220 Disabled 00000000:00000000 00136610     3 MTA (GC) (Threadpool Worker)
  12    2 52c0 00117b20      b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Finalizer)
  13    3 4ca0 0011c638      1220 Enabled  00000000:00000000 000dc7d0     0 Ukn
   8    4 4034 0012f6b8   880a220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  14    5 5424 00133818    80a220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
XXXX    8    0 0226e710      9820 Enabled  00000000:00000000 000dc7d0     0 Ukn
  19    7 42ac 0226e350   880b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  20    6 5dbc 09d756e8   180b220 Enabled  00000000:00000000 00136610     2 MTA (Threadpool Worker)
   5    9 4f74 0014f238       220 Enabled  00000000:00000000 000dc7d0     0 Ukn
  17    a 4378 09d90880       220 Enabled  00000000:00000000 000dc7d0     0 Ukn
  23    c 5a4c 09ead790   880b220 Enabled  00000000:00000000 000dc7d0     0 MTA (Threadpool Completion Port)
  26    b 5230 09ece8f0   180b222 Disabled 00000000:00000000 00136610     1 MTA (Threadpool Worker)

From the !threads output we can see that in fact thread 26 has preemptive GC disabled. I’ve talked about this in earlier posts, but in short, some technologies like fusion (used to load assemblies) will disable GC preemption to avoid getting interrupted during some critical operations.

So what is this thread doing?

The MyTools.MyAssemblyLoader constructor is trying to load up an assembly using Assembly.LoadFile, so that is why preemptive GC is disabled. This is usually a very quick operation but here it is stuck waiting on some event before it can finish.

0:026> kb
ChildEBP RetAddr  Args to Child
0df0cd90 7c827d0b 77e61d1e 00000c5c 00000000 ntdll!KiFastSystemCallRet
0df0cd94 77e61d1e 00000c5c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0df0ce04 79e718fd 00000c5c ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0df0ce48 79e718c6 00000c5c ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
0df0ce98 79e7187c ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
0df0cea8 79f90f81 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
0df0ceb8 79f917ba 7a390970 0230cc18 ffffffff mscorwks!CExecutionEngine::WaitForEvent+0x37
0df0cecc 7a15cfce 0230cc18 ffffffff 00000000 mscorwks!ClrWaitEvent+0x17
0df0cedc 7a153339 0df0d2ac 00000000 00000000 mscorwks!CSimpleFusionBindSink::Wait+0xe
0df0cef8 7a1545d2 09ed0180 0013a868 00000200 mscorwks!BindHelper+0x7d
0df0d194 7a2d0c7c 0df0d2ac 0013a868 00000000 mscorwks!ExplicitBind+0x1a6
0df0d4b0 793f583f 00000000 17b75570 17b75498 mscorwks!AssemblyNative::LoadFile+0x1de
0df0d4d0 7934f2ed 00000000 00000000 17b7546c mscorlib_ni+0x33583f
0df0d4f4 085f6ec4 7a3177c0 17b3df74 00000000 mscorlib_ni+0x28f2ed
WARNING: Frame IP not in any known module. Following frames may be wrong.
0df0d530 085f6646 17b70088 17b6f1b4 17b70088 0x85f6ec4
0df0d7c8 085f48a2 790d57b4 17b6e648 17b40284 0x85f6646
*** WARNING: Unable to verify checksum for System.Data.ni.dll
0df0dc10 651a82d5 17b5275c 79103ae8 00000000 0x85f48a2
0df0dff0 79f23aa2 95b9e085 026d1f54 17893ecc System_Data_ni!_load_config_used+0x56c45
0df0e0a0 00000000 00000000 00000000 00000000 mscorwks!COMNlsInfo::nativeChangeCaseString+0x18f
0:026> !sos.clrstack
OS Thread Id: 0x5230 (26)
ESP       EIP
0df0d200 7c8285ec [HelperMethodFrame_PROTECTOBJ: 0df0d200] System.Reflection.Assembly.nLoadFile(System.String, System.Security.Policy.Evidence)
0df0d4b8 793f583f System.Reflection.Assembly.LoadFile(System.String)
0df0d4cc 085f704d MyTools.MyAssemblyLoader..ctor(System.String)
...
0df0ebb0 083e85b0 MyAPP.Default.Page_Load(System.Object, System.EventArgs)
0df0ed10 698a1928 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0df0ed20 6629769f System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0df0ed34 66143a84 System.Web.UI.Control.OnLoad(System.EventArgs)
0df0ed44 66143ad0 System.Web.UI.Control.LoadRecursive()
0df0ed58 66155106 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0df0ef10 66154a1b System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0df0ef48 66154967 System.Web.UI.Page.ProcessRequest()
0df0ef80 66154887 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0df0ef88 6615481a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0df0ef9c 083e826e ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0df0efa8 65ff27d4 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0df0efdc 65fc15b5 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0df0f01c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0df0f06c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0df0f088 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0df0f0bc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0df0f0c8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0df0f278 79f35ee8 [ContextTransitionFrame: 0df0f278]
0df0f2c8 79f35ee8 [GCFrame: 0df0f2c8]
0df0f420 79f35ee8 [ComMethodFrame: 0df0f420]

Looking around at other threads, we find another thread that is also loading up an assembly from the same location so this is probably the thread that will set the event so that thread 26 can continue.

0:020> kL 2000
ChildEBP RetAddr
0993cec0 7c827d0b ntdll!KiFastSystemCallRet
0993cec4 77e61d1e ntdll!NtWaitForSingleObject+0xc
0993cf34 79e718fd kernel32!WaitForSingleObjectEx+0xac
0993cf78 79e718c6 mscorwks!PEImage::LoadImage+0x199
0993cfc8 79e7187c mscorwks!CLREvent::WaitEx+0x117
0993cfd8 79f40d79 mscorwks!CLREvent::Wait+0x17
0993cfe8 79f40e96 mscorwks!WKS::GCHeap::WaitUntilGCComplete+0x32
0993d024 79e7ee77 mscorwks!Thread::RareDisablePreemptiveGC+0x1a1
0993d05c 79e75c65 mscorwks!CrstBase::Enter+0x1e6
0993d060 79e75c59 mscorwks!EEEnterCriticalSection+0x9
0993d068 79e75c3b mscorwks!CExecutionEngine::AcquireLock+0x9
0993d074 79eb2f2d mscorwks!ClrEnterCriticalSection+0xf
0993d080 7a1481d3 mscorwks!CCriticalSection::Lock+0x17
0993dd2c 7a14893a mscorwks!CAssemblyCacheItem::MoveAssemblyToFinalLocation+0x31f
0993dd74 7a146e2d mscorwks!CAssemblyCacheItem::LegacyCommit+0x245
0993ddb0 7a16198e mscorwks!CAssemblyCacheItem::Commit+0x5d
0993e45c 7a161e12 mscorwks!CAsmDownloadMgr::CreateAssembly+0x85b
0993e498 7a162550 mscorwks!CAsmDownloadMgr::DoSetupPushToCache+0x50
0993e720 79f90e65 mscorwks!CAsmDownloadMgr::DoSetup+0x26a
0993e76c 79f8f2cc mscorwks!CAssemblyDownload::DoSetup+0x7b
0993e7a0 79f8f1e4 mscorwks!CAssemblyDownload::DownloadComplete+0xb6
0993e9fc 79f8f0ca mscorwks!CAssemblyDownload::KickOffDownload+0x19c
0993ea64 79f2fd56 mscorwks!CAssemblyName::BindToObject+0x568
0993eaac 7a1545d2 mscorwks!BindHelper+0x6b
0993ed48 7a2d0c7c mscorwks!ExplicitBind+0x1a6
0993f064 793f583f mscorwks!AssemblyNative::LoadFile+0x1de
0993f0bc 79e7ae3e mscorlib_ni+0x33583f
0993f0d0 00000000 mscorwks!MetaSig::MetaSig+0x125
0:020> !sos.clrstack
OS Thread Id: 0x5dbc (20)
ESP       EIP
0993edb4 7c8285ec [HelperMethodFrame_PROTECTOBJ: 0993edb4] System.Reflection.Assembly.nLoadFile(System.String, System.Security.Policy.Evidence)
0993f06c 793f583f System.Reflection.Assembly.LoadFile(System.String)
0993f080 085f704d MyTools.MyAssemblyLoader..ctor(System.String)
0993f0b0 083e8612 MyAPP.Default.Page_Load(System.Object, System.EventArgs)
0993f210 698a1928 System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr, System.Object, System.Object, System.EventArgs)
0993f220 6629769f System.Web.Util.CalliEventHandlerDelegateProxy.Callback(System.Object, System.EventArgs)
0993f234 66143a84 System.Web.UI.Control.OnLoad(System.EventArgs)
0993f244 66143ad0 System.Web.UI.Control.LoadRecursive()
0993f258 66155106 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
0993f410 66154a1b System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
0993f448 66154967 System.Web.UI.Page.ProcessRequest()
0993f480 66154887 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
0993f488 6615481a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
0993f49c 083e826e ASP.default_aspx.ProcessRequest(System.Web.HttpContext)
0993f4a8 65ff27d4 System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
0993f4dc 65fc15b5 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
0993f51c 65fd32e0 System.Web.HttpApplication+ApplicationStepManager.ResumeSteps(System.Exception)
0993f56c 65fc0225 System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
0993f588 65fc550b System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
0993f5bc 65fc5212 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
0993f5c8 65fc3587 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
0993f778 79f35ee8 [ContextTransitionFrame: 0993f778]
0993f7c8 79f35ee8 [GCFrame: 0993f7c8]
0993f920 79f35ee8 [ComMethodFrame: 0993f920]

The problem here is that this thread has enabled preemptive GC again and the GC has suspended it so that it is now waiting for the GC to finish before it can finish.

In other words we have a deadlock situation caused by a race condition with two threads loading the same dll at the same time:

  • Thread 10 – initiating GC – suspending all threads
  • Thread 26 – can’t be suspended until it has finished loading the dll but is waiting for thread 20 to do so
  • Thread 20 – waiting for the GC to finish to continue loading its dll

Solutions

This issue could be resolved by putting a lock around the Assembly.Load in the MyTools.MyAssemblyLoader constructor.

However, this issue is actually fixed in a hotfix which is included in .NET 2.0 SP1 so if you are running into this you can install SP1 for .NET Framework 2.0 to fix the problem.

Have a good one,

Tess