9 minute read

For a long time all my case studies have been on 1.1. it’s time to venture out in 2.0 land and look at what may seem like a 2.0 specific issue.

I say “may seem” because this case study will only directly crash if you are using 2.0, but as you’ll learn later the problem existed in 1.1 and 1.0, it was just way harder to track down.

Problem description

Once in a while ASP.NET crashes and we see events in the system event log like this one

Event Type:       Warning
Event Source:     W3SVC
Event Category:   None
Event ID:         1009
Date:             2006-04-25
Time:             09:41:22
PM User:          N/A
Computer:         SUBSPACE1
Description:
A process serving application pool 'ASP.NET V2.0' terminated unexpectedly. The process id was ‘1732’. The process exit code was ‘0xe0434f4d’.

Or this one

Event Type:       Warning
Event Source:     W3SVC
Event Category:   None
Event ID:         1011
Date:             2006-04-25
Time:             09:41:22
User:             N/A
Computer:         SUBSPACE1
Description:
A process serving application pool 'ASP.NET V2.0' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '6256'. The data field contains the error number.

And in the application event log we get a pretty cryptic error message like this one

Event Type:       Error
Event Source:     .NET Runtime 2.0 Error Reporting
Event Category:   None
Event ID:         5000
Date:             2006-04-25
Time:             09:41:20
User:             N/A
Computer:         SUBSPACE1
Description:
EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.1830, P3 42435be1, P4 app_code.pn5mfdcr, P5 0.0.0.0, P6 444dcf44, P7 5, P8 5, P9 system.dividebyzeroexception, P10 NIL.

Initial thoughts

Ok, so what do we know about the issue? We know that asp.net terminated unexpectedly, and that right before this we got a System.DivideByZeroException. We also know that the process exit code was 0xe0434f4d whatever that means, hmm…

Usually when you get a stopped unexpectedly error message the exit code will be the type of exception that caused the crash. For example a 0xC0000005 means you got a second chance access violation, 0x800703e9 means you suffered a StackOverflowException but what about 0xe0434f4d?

0xe0434f4d is the exception code for CLR (.net) exceptions, so any managed exception like a NullReferenceException or InvalidOperationException or SQLException, basically all managed exception are natively referred to as 0xe0434f4d. In this case, if we look closer at the application event log entry we can see that it is in fact a System.DivideByZero exception.

Trivia: Just a piece of info of no particular value that you might want to pull out of pocket on your next date 0xe0434f4d or at least 43 4f 4d are the ASCII values for the letters COM.

But hey now… should a .net exception cause the asp.net process to crash??? If you divide by zero in your page and don’t have a try catch block around it, surely you will get one of those “nice” white and yellow error pages saying that an exception occurred on your page, but the process doesn’t just exit.

The answer is yes, you will get one of those pages because the asp.net global error handler will eventually catch your exception, format it for you and print it out on the screen. But what happens if it is not on an asp.net request, so there is no-one to feedback the exception to? It’s the old paradox: If a tree falls in the forest and nobody is there, does it still make a sound?

In 1.0 and 1.1 it didn’t. For example if you throw an exception in a piece of code called on a timer, or use QueueUserWorkItem and throw an exception in code executing there, or otherwise throw exceptions in code that is not running inside the context of an asp.net request, the framework will swallow the exception and continue. Or rather it will stop that thread of execution but it won’t die.

Doesn’t sound all that bad right? Really???

That thread could have been doing anything, and we will never be the wiser that it died. It could have been holding a lock of some sort, or it could have been in the middle of cleaning up resources, or really a number of different things that will now never happen, but that may immediately or eventually have really bad side effects like hangs or crashes or memory issues, but the exception will be long gone so we cant figure out what it was.

The policy for unhandled exceptions was changed in ASP.NET 2.0 to the default for .net which is a process exit. This can be changed back by adding the following to the aspnet.config in the frameworks directory, but I wouldn’t recommend it without putting in some preventive measures to take care of potential unhandled exceptions on non ASP.NET threads.

<configuration>
    <runtime>
        <legacyUnhandledExceptionPolicy enabled="true" />
    </runtime>
</configuration>

Troubleshooting the issue

The main task here is to find out where this DivideByZeroException is coming from and why it occurred so there are two ways to figure this out (short of complete code inspection).

Strategy #1 – logging the exception

The first way, and this is the way I would probably recommend, is to create an UnhandledExceptionHandler to log the exception along with its stack trace in the event log as shown in this KB article

You add the handler like this to the web.config:

<system.web>
    <httpModules>
        <add type="WebMonitor.UnhandledExceptionModule, <strong name>" name="UnhandledExceptionModule"/>
    </httpModules>
    ...
</system.web>

And it hooks an event handler up to the UnhandledException event of the current app domain.

You don’t actually need to strong name it and add it to the GAC, however if you plan it in multiple applications you should to avoid for the dll being loaded multiple times.

Now the next time you get one of these unhandled exceptions, the process will still exit (unless you change the unhandled exception policy), but you have a very good chance of fixing the issue.

The event for the exception in this particular sample looks like this…

Event Type:       Error
Event Source:     ASP.NET 2.0.50727.0
Event Category:   None
Event ID:         0
Date:             2006-04-25
Time:             09:41:20
User:             N/A
Computer:         SUBSPACE1
Description:
UnhandledException logged by UnhandledExceptionModule.dll:
appId=/LM/w3svc/1/ROOT/CrashMe
type=System.DivideByZeroException
message=Attempted to divide by zero.
stack=
   at MyFinalizerClass.Finalize()
.

Bingo!!! So the exception occurs in MyFinalizerClass.Finalize() in the CrashMe application.

In fact the code for the finalizer for this class looks like this, so it is pretty obvious what caused it, and our work here is done…

~MyFinalizerClass()
{
    int i = 0;
    int j = 9;
    i = j / i;
}

Setting up an UnhandledException handler like this is not limited to 2.0. You can absolutely do this in 1.1 as well to determine if you are throwing any unhandled exceptions. The only modification you need to do to the code outlined in the article is to choose a different dll to gather the version from. A prime candidate for this would be mscorlib.dll.

Strategy #2 – debugging with windbg

If you have read any of my previous posts you know I have a special place in my heart for debugging. If you don’t have an unhandled exception event handler you can still find out what the exception was by running adplus in crash mode to generate dumps when the process exits.

In this particular case doing this will result in the following files being generated.

C:\debuggers\Crash_Mode__Date_04-25-2006__Time_10-11-3030>dir
 Volume in drive C has no label.
 Volume Serial Number is 30D7-F806

 Directory of C:\debuggers\Crash_Mode__Date_04-25-2006__Time_10-11-3030

2006-04-25  10:12    <DIR>          .
2006-04-25  10:12    <DIR>          ..
2006-04-25  10:11                      4 404 ADPlus_report.txt
2006-04-25  10:11    <DIR>          CDBScripts
2006-04-25  10:11                      9 420 059 PID-3368__W3WP.EXE__1st_chance_IntegerDivide__mini_12D8_2006-04-25_10-11-51-487_0D28.dmp
2006-04-25  10:12                      108 142 158 PID-3368__W3WP.EXE__1st_chance_Process_Shut_Down__full_12D8_2006-04-25_10-12-02-440_0D28.dmp
2006-04-25  10:12                      108 294 606 PID-3368__W3WP.EXE__2nd_chance_NET_CLR__full_12D8_2006-04-25_10-11-54-690_0D28.dmp
2006-04-25  10:12                      35 324 PID-3368__W3WP.EXE__Date_04-25-2006__Time_10-11-3030.log
2006-04-25  10:11                      9 014 Process_List.txt
               6 File(s)    225 905 565 bytes
               3 Dir(s)   4 592 435 200 bytes free

So at 10-11-51 a 1st chance IntegerDivide exception occurred, followed by a 2nd chance .NET exception (2nd chance meaning that it wasn’t handled), and finally this was followed by a process shutdown.

Concentrating on the 2nd chance exception dump and loading up sos.dll from the framework directory

0:020> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll

We can run !threads to give us a list of the threads along with info about which thread the exception occurred on (in this case thread 20, the finalizer thread).

0:020> !threads
ThreadCount: 7
UnstartedThread: 0
BackgroundThread: 7
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                                        PreEmptive   GC Alloc                          Lock
   ID    OSID    ThreadOBJ       State        GC          Context                  Domain   Count   APT   Exception
   16    1  c20  000b9778      3808220    Enabled  00000000:00000000 000fddc0     1 Ukn (Threadpool Worker)
   20    2 16d8  000e3bd0      b220 Enabled  0242ab08:0242bf94 000caad8     0 MTA (Finalizer) System.DivideByZeroException (02428f60)
   21    3  b2c  000fa588    80a220 Enabled  00000000:00000000 000caad8     0 MTA (Threadpool Completion Port)
   22    4 16c0  000fd690      1220 Enabled  00000000:00000000 000caad8     0 Ukn
   14    6 1570  0014a970   880a220 Enabled  00000000:00000000 000caad8     0 MTA (Threadpool Completion Port)
   23    7 1b44  00145178   180b220 Enabled  00000000:00000000 000caad8     0 MTA (Threadpool Worker)
   24    5  bac  0014a5b0   880b220 Enabled  00000000:00000000 000caad8     0 MTA (Threadpool Completion Port)

So from this we know that we are looking at an exception happening during finalization.

Since this dump was triggered by the exception this will also be the active thread when you load up the dump.

If we look at the native and managed callstack it doesn’t really bring us any closer to the solution…

0:020> kb 2000
ChildEBP RetAddr  Args to Child
01eaf9fc 79f97065 e0434f4d 00000001 00000001 kernel32!RaiseException+0x53
01eafa5c 7a05b941 02428f60 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x226
01eafa70 7a05b9b2 02428f60 00000000 01eafb88 mscorwks!RaiseTheException+0x4c
01eafa98 7a05b9ea 00000000 000e3bd0 4914714b mscorwks!RaiseTheException+0xbe
01eafac4 7a05b9f5 02428f60 00000000 7a0e0701 mscorwks!RealCOMPlusThrow+0x37
01eafad0 7a0e0701 02428f60 7740e5f5 7a36a738 mscorwks!RealCOMPlusThrow+0xa
01eafc00 7a0e16e4 00000000 01eafc38 01eafca8 mscorwks!Thread::RaiseCrossContextException+0x63
01eafcb4 79f3a09c 000fddc0 79f3a0a1 01eafeb4 mscorwks!Thread::DoADCallBack+0x25a
01eafcd0 79f3581e 01eafeb4 79f3ab82 000fddc0 mscorwks!Thread::UserResumeThread+0xf3
01eafcd8 79f3ab82 000fddc0 79f3abb0 01eafd08 mscorwks!ManagedThreadBase::FinalizerAppDomain+0x26
01eafd34 79f350ea 000e3bd0 00000000 01eafd5b mscorwks!SVR::CreateGCHeap+0x163
01eafdb0 79f3500e 00000000 00000000 01eafe6c mscorwks!SVR::CreateGCHeap+0x204
01eafdc8 79ecb4a4 01eafeb4 00000000 00000001 mscorwks!SVR::GCHeap::FinalizerThreadWorker+0xe7
01eafdd8 79ecb442 01eafeb4 01eafe60 79f93fe6 mscorwks!Thread::UserResumeThread+0xfb
01eafe6c 79ecb364 01eafeb4 7740e0c9 00000000 mscorwks!Thread::DoADCallBack+0x355
01eafea8 79ed5e8b 01eafeb4 00000000 000fddc0 mscorwks!Thread::DoADCallBack+0x541
01eafed0 79ed5e56 79f34f4a 00000008 79f6fd87 mscorwks!ManagedThreadBase_NoADTransition+0x32
01eafedc 79f6fd87 79f34f4a 7740e175 00000000 mscorwks!ManagedThreadBase::FinalizerBase+0xb
01eaff14 79ecb00b 00000000 00000000 00000000 mscorwks!SVR::GCHeap::FinalizerThreadStart+0xbb
01eaffb8 77e66063 000e4390 00000000 00000000 mscorwks!Thread::intermediateThreadProc+0x49
01eaffec 00000000 79ecafc5 000e4390 00000000 kernel32!BaseThreadStart+0x34
0:020> !clrstack
OS Thread Id: 0x16d8 (20)
ESP       EIP
01eafaac 77e55dea [GCFrame: 01eafaac]
01eafb14 77e55dea [GCFrame: 01eafb14]
01eafd14 77e55dea [GCFrame: 01eafd14]

But from the !threads output we get the address of the exception (02428f60) and we can take a deeper look…

0:020> !do 02428f60
Name: System.DivideByZeroException
MethodTable: 7915e020
EEClass: 791e97f8
Size: 72(0x48) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fa3e0  40000b5        4        System.String  0 instance 02429e78 _className
79109208  40000b6        8 ...ection.MethodBase  0 instance 00000000 _exceptionMethod
790fa3e0  40000b7        c        System.String  0 instance 0242a18c _exceptionMethodString
790fa3e0  40000b8       10        System.String  0 instance 02429fb4 _message
79113dfc  40000b9       14 ...tions.IDictionary  0 instance 00000000 _data
790fa9e8  40000ba       18     System.Exception  0 instance 00000000 _innerException
790fa3e0  40000bb       1c        System.String  0 instance 00000000 _helpURL
790f9c18  40000bc       20        System.Object  0 instance 00000000 _stackTrace
790fa3e0  40000bd       24        System.String  0 instance 00000000 _stackTraceString
790fa3e0  40000be       28        System.String  0 instance 0242aab0 _remoteStackTraceString
790fed1c  40000bf       34         System.Int32  0 instance        0 _remoteStackIndex
790f9c18  40000c0       2c        System.Object  0 instance 00000000 _dynamicMethods
790fed1c  40000c1       38         System.Int32  0 instance -2147352558 _HResult
790fa3e0  40000c2       30        System.String  0 instance 0242a3c4 _source
790fe160  40000c3       3c        System.IntPtr  0 instance        0 _xptrs
790fed1c  40000c4       40         System.Int32  0 instance        0 _xcode

In particular we can dump out the _remoteStackTraceString and find out where exactly the error occurred.

0:020> !do 0242aab0
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 88(0x58) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String:    at MyFinalizerClass.Finalize()

Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fed1c  4000096        4         System.Int32  0 instance       36 m_arrayLength
790fed1c  4000097        8         System.Int32  0 instance       35 m_stringLength
790fbefc  4000098        c          System.Char  0 instance       20 m_firstChar
790fa3e0  4000099       10        System.String  0   shared   static Empty
    >> Domain:Value  000caad8:790d6584 000fddc0:790d6584 <<
79124670  400009a       14        System.Char[]  0   shared   static WhitespaceChars
    >> Domain:Value  000caad8:023d03f0 000fddc0:023d4374 <<

Voila, there we have it again… our MyFinalizerClass.Finalize()

If you want to take away just one thing from this post, it should be to be careful with the code running on non ASP.NET threads, making sure that you have proper try / catch blocks around any code that could cause an exception.

Until next time…