6 minute read

It’s been a while again. Lots of prepping for the debugging workshops and a bit of re-modeling done to the house but now it’s time to write again…

The other day I received an email about my blog with the following question (slightly paraphrased):

I wanted to create a quick and dirty site map, so in my override of BuildSiteMap() I set up a small loop to add some fake sitemap nodes.

public override SiteMapNode BuildSiteMap(){
    for (int i = 0; i < 5; i++)
        myRoot.ChildNodes.Add(new SiteMapNode(this, i.ToString(), i.ToString(), i.ToString()));
    return myRoot;

When I run the application I get a StackOverflowException and the server crashes. And when I step through this code in the debugger I see something really strange:

1) int i = 0
2) i < 5
3) myRoot...
4) int i = 0
5) i < 5

So it seems like i is never incremented, and some rudimentary tests show that unless I call into a SiteMapNode (access a property, call a method…) the loop functions correctly. What is making this loop indefinitely? it looks like a bug in the compiler or the CLR perhaps…

When I got this question I really didn’t know anything about the site navigation features in ASP.NET 2.0, but I found these articles from Scottgu and 4GuysFromRolla to be really nice overviews

Initial thoughts

The great thing about this problem is that it is consistently reproducible which meant that it is possible to even live debug, but before we get that far, let’s take a a few steps back and see what we have here…

  1. A StackOverflowException
  2. A loop that seems to start over and over and over

I have talked about StackOverflowExceptions earlier on this blog, and just to recap… the cause for a stack overflow is that we have exceeded the amount of memory reserved for the stack by allocating too many function pointers, pointers to local vars and parameters on the stack. By far, the most common reason for a stack overflow is a never-ending recursion. In other words, functionA calls functionB which calls functionA which calls functionB and so on, and so on…

So a call stack that looks something like this…


Ok, thats all fine and dandy, but that only explains the stack overflow. What about the crazy looping?

Well… picture that you have this function (with a breakpoint on the –> line)

void MyRecursiveFunction(){
     for(int i=0; i<5; i++){
-->      MyRecursiveFunction();

When you stop on the breakpoint the first time i would be 0 and the call stack would look like this…


Now we call into MyRecursiveFunction (ourselves) with a new pass through this function and i=0 again (since we are not really in the same loop)…

So if we let this go a few rounds and replace MyRecursiveFunction by the code inside it we are really executing something like this

for(int i=0; i<5; i++){
    for(int i2=0; i2<5; i2++){
        for(int i3=0; i3<5; i3++){
            for(int i4=0; i4<5; i4++){
                for(int i5=0; i5<5; i5++){
                    for(int i6=0; i6<5; i6++){
                        for(int i7=0; i7<5; i7++){

… but when we look at it in Visual Studio it looks like we are always in the same loop and that we are looping around without changing the value of i. You don’t get the depth perception until you actually look at the call stack.

If we looked at the call stack, the call stack would now look like…


So the conclusion from the initial thoughts is that we are definitely looking at some kind of recursion… but where? The code in the sample

myRoot.ChildNodes.Add(new SiteMapNode(this, i.ToString(), i.ToString(), i.ToString()));

doesn’t look all that complex…

The most likely suspects new SiteMapNode() and myRoot.ChildNodes.Add() don’t really seem to do anything too weird if I look at them with reflector.

Debugging the problem

Finally :) A little less conversation, a little more windbg action please…

Since it was so easy to repro and I could repro it on my dev machine I just attached windbg (File / Attach to process) to w3wp.exe and hit g to let it go. Then i reproduced the problem and it stopped nicely at this prompt showing me that the issue was a stack overflow (like we already knew).

(7e4.ddc): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0fa4235c ebx=02beca74 ecx=02beca74 edx=02becb54 esi=02becb54 edi=02beca74
eip=686b5cb4 esp=02163000 ebp=02163004 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00210246
686b5cb4 56 push esi

If we take a look at the stack with !clrstack to see how we end up here we only see this…

0:016> !clrstack
OS Thread Id: 0xddc (16)
02163000 686b5cb4 System.Web.StaticSiteMapProvider.GetChildNodes(System.Web.SiteMapNode)

… which doesn’t really tell us very much. Sometimes when we are in a stack overflow !clrstack will have problems enumerating the stack so we have to look at the raw stack instead using !dumpstack.

Note: !dumpstack doesn’t show the true stack (some functions may be wrong) but it gives us a very good idea of what is going on…

0:016> !dumpstack
OS Thread Id: 0xddc (16)
Current frame: (MethodDesc 0x68b03720 +0x4 System.Web.StaticSiteMapProvider.GetChildNodes(System.Web.SiteMapNode))
ChildEBP RetAddr Caller,Callee
02163004 686b1fc4 (MethodDesc 0x68aeff30 +0x18 System.Web.SiteMapNode.get_ChildNodes())
0216300c 0f765641 (MethodDesc 0xfa42328 +0x59 ViewSiteMapProvider.BuildSiteMap())
0216303c 686b5cdf (MethodDesc 0x68b03720 +0x2f System.Web.StaticSiteMapProvider.GetChildNodes(System.Web.SiteMapNode))
02163074 686b1fc4 (MethodDesc 0x68aeff30 +0x18 System.Web.SiteMapNode.get_ChildNodes())
0216307c 0f765641 (MethodDesc 0xfa42328 +0x59 ViewSiteMapProvider.BuildSiteMap())
021630ac 686b5cdf (MethodDesc 0x68b03720 +0x2f System.Web.StaticSiteMapProvider.GetChildNodes(System.Web.SiteMapNode))
021630e4 686b1fc4 (MethodDesc 0x68aeff30 +0x18 System.Web.SiteMapNode.get_ChildNodes())
021630ec 0f765641 (MethodDesc 0xfa42328 +0x59 ViewSiteMapProvider.BuildSiteMap())
0216311c 686b5cdf (MethodDesc 0x68b03720 +0x2f System.Web.StaticSiteMapProvider.GetChildNodes(System.Web.SiteMapNode))
02163154 686b1fc4 (MethodDesc 0x68aeff30 +0x18 System.Web.SiteMapNode.get_ChildNodes())
0216315c 0f765641 (MethodDesc 0xfa42328 +0x59 ViewSiteMapProvider.BuildSiteMap())

Ok, so the problem seems to be with the ChildNodes property, in that it calls into a GetChildNodes function, which in turn calls our BuildSiteMap function again which calls the ChildNodes property and so it goes until we end up with a StackOverflowException.


In the documentation for BuildSitemap you can find the following passage:

The BuildSiteMap method is called by the default implementation of the FindSiteMapNode, GetChildNodes, and GetParentNode methods. If you override the BuildSiteMap method in a derived class, ensure that it loads site map data only once and returns on subsequent calls.

To avoid recursion and stack overflows here it is best that we avoid calling these functions, instead we can call the AddNode function to add child nodes like in the BuildSiteMap sample.

This is also documented in the Site Map Providers article, which is well worth a read btw.

BuildSiteMap should generally not call other site map provider methods or properties, because many of the default implementations of those methods and properties call BuildSiteMap. For example, the simple act of reading RootNode in BuildSiteMap causes a recursive condition that terminates in a stack overflow.

I think this example is a good example of how you can debug things that you really don’t have all that much knowledge about, like me with SiteMaps for example. But it also makes me think about how much stuff there is out there that I have yet to look at. And there sure is a lot of stuff. My colleague Doug just posted a list of some upcoming stuff like .NET Framework 3.0, IronPython 1.0 and Visual Studio code name “Orcas” to name a few