12 minute read

It was really exciting to see that so many people answered the .NET GC PopQuiz, especially seeing that so many had great answers. Perhaps the questions were too easy :)

The reason I posted the pop quiz in the first place is that, as opposed to Phil, who commented that none of this should really matter to the developer :), I do think that a good understanding of what happens behind the scenes when you are programming on top of a lot of code that you don’t control, is important since it tells you a lot about how to design your app for best performance. Granted, some of it might be of less importance than other things, but still…

Without further ado, here are my answers…

1. How many GC threads do we have in a .NET process running the Server version of the GC on a dual-core machine

Two, one per processor, or rather one per logical processor, so as many of you pointed out it would have been 4 if it was hyper threaded. In a process running the workstation version of the GC we would have no dedicated GC threads, instead garbage collection runs on the thread initiating the GC as there is no point in switching to a different thread for garbage collection when you only have one proc/one thread doing the GC.

Chris Lyon has a good writeup about GC modes and also an interesting post about GC latency modes coming in Orcas.

Why is this important to you? Since different GC modes are optimized for different things, your memory usage, GC latency etc. may vary a lot depending on what GC mode you are using. For example a windows service by default gets the workstation GC, but if there is a lot of throughput (lots of short lived allocations), you are probably a lot better off running the server GC for memory usage and performance.

Btw, I really enjoyed the fact that Brian used the debugger to find this out, thats the spirit :)

2. What GC mode is used in the web development server (cassini) on a quad proc machine, and why (you can choose from server, workstation or concurrent-workstation)

Nice catch for those of you who figured out that it was workstation GC because it is a win forms app. More specifically it is concurrent workstation meaning that most of the GC stuff will be done while other threads are executing to avoid pausing the process.

Typically you wouldn’t stress test against the web development server, but if you do any kind of memory investigation, looking at when memory is released etc. you need to know that what happens on your web dev server is probably not the same thing that will happen on your multi processor web server in terms of garbage collection.

3. How many finalizers do we have in a .NET process running the Server version of the GC on a quad proc machine

This is probably one of the more important questions, and there were different bids on this but I think most people answered one per process which is correct.

Ok, so why is this important to you? Well, its just a reminder that any objects you create that require finalization (i.e. that has a finalizer method or a destructor) will have to go through one single point (the finalizer thread) unless the object is disposed and GC.SupressFinalize is called.

4. When is an object garbage collected

Now this is interesting, and there were a lot of different bids on this one, but I think Brian said it best

When a GC occurs which happens either when you allocate an object which makes gen 0 exceed its current capacity, or when GC.Collect is explicitly called and the object is no longer referenced.

A few clarifications here:

  1. If we exclude GC.Collect calls, this means that a GC will only occur on allocation. I have mentioned this before, but I think it is worth mentioning again… a classic mistake to make is to run a stress test and then come back 10 mins after the stress test and wonder why memory is not being released. In other words, objects may well be ready to be released but no allocations are made, meaning no GCs will occur, so memory usage will stay flat.
  2. There are places in the framework where GC.Collect is called. NativeCPP mentioned that a GC would occur when there is memory pressure, I suspect he meant when a gen limit was reached, but it will also occur in ASP.NET apps when we are closing in on the memory limit set in IIS/machine.config. To see if the app is calling GC.Collect, you can look at .NET CLR Memory/# Induced GC. Recently I also found another location where GC.Collect is called. In some parts of the System.Drawing namespace GC.Collect is called to avoid having too many handles to brushes/fonts etc. in the process and getting a stale process because of this. Typically no-one should run into this since it is only likely to happen in a server app and System.Drawing is not supported in server apps according to the MSDN docs, but still it happens.
  3. Regarding the object no longer being referenced, a reference in this case can be a lot of things, but in short, references are typically
    • strong - static/global objects, including cache or in proc sessions since they are rooted in static objects
    • reference counts like x.static suggested - mostly for com wrappers
    • stack - objects that are still alive on a thread
    • pinned objects - usually happens when there are native API calls, or remoting/web services involved.
    • finalizer - if the object has a finalizer the object will still be garbage collected, but it will hang out afterwards, waiting to be finalized

After some of the comments I realized that the wording of the question is a bit ambiguous. What I meant with the question was basically “when does a collection occur”, but I am adding in some comments from Maoni that would answer the question I actually posed “when is an object collected” :)… plus a comment on what nativecpp had mentioned, just goes to show, you learn something new every day :)

Tess, re question 4. When is an object garbage collected? When a GC occurs which happens either when you allocate an object which makes gen 0 exceed its current capacity, or when GC.Collect is explicitly called and the object is no longer referenced.

This is a bit incorrect - the correct answer should be:

When a GC that collects the generation your object is in happens, your object, if is dead, is collected. If your object is in gen2 and we are only doing a gen0 collection, your object is not gonna be collected.

NativeCPP mentioned that a GC would occur when there is memory pressure, I suspect he meant when a gen limit was reached, but it will also occur in ASP.NET apps when we are closing in on the memory limit set in IIS/machine.config.

Actually he/she is right - we do trigger GCs when the machine is under memory pressure. This is described in my first Using GC Efficiently blog entry.”

5. What causes an object to move from Generation 0 to Generation 1 or to Generation 2

If the object is still referenced during a garbage collection it will automatically move into the next generation, this includes objects referenced by the finalizer (freacheable queue).

Brian mentioned that they would not be moved if they are pinned… I don’t want to make a categoric statement here since I might very well be wrong, but I don’t see why pinning would make it not move into the next generation. The term “move” here is somewhat fictive. The objects don’t necessarily move, instead the generation lines move, so that at the end of each garbage collection Gen 0 will always be empty, meaning that if a pinned object was located in Gen 0, by the end of the GC it would have to be in Gen 1.

6. If you look at the GC sizes for Generation 0, 1 and 2 in perfmon, why is most of the memory in the process in Gen 2

As Stefan and others mentioned gen 0 and 1 have fairly small sizes, and once these are reached the objects in there that are still referenced move into Gen 2. Although the sizes are not “fixed” as Stefan suggests, but rather dynamic over the life of the process, in order to get the most value out of each GC, they are still limited, and objects will only stay there until the next Gen 0/Gen 1 GC as opposed to Gen 2 where referenced objects will stay forever.

In other words, given a limit x for Gen 0 and y for Gen 1. The rest of the .NET memory usage (for managed objects) has to be in either Gen 2 or the large object heap. No matter how good your allocation pattern is, there is no way that you can fit say 100 MB in Gen 0 and Gen 1 :) The trick is just to not let objects spill over to Gen 2 and then die immediately so that you have a lot of turnover in Gen 2.

7. How many heaps will you have at startup on a 4 proc machine running the server GC? How many would you have if the same machine was running the workstation GC? Will the memory used for these show up in private bytes or virtual bytes in perfmon or both

In retrospect I should have specified this a little bit. Some people mentioned the runtime heaps, NT heaps, loader heap etc. and I have to admit, I was just too snowed in in my own little .net object world when I wrote this question. What I meant was, how many .NET GC heaps will you have. Even there the question is debatable. In an interview situation I would have said that 4 was ok, but what I really wanted the answer to be was 8. 4 small object heaps and 4 large object heaps.

Ok, so why am I such a stickler for this number of threads and number of heaps bladibladibla? Well, a lot of people pose the question, why do I have so many virtual bytes at the startup of a .NET process and why does virtual bytes go up in chunks? When you look at that it is important to know how much of that memory goes to these GC heaps and also knowing that they will probably eventually be filled with .net objects, so a large variation of private bytes/virtual bytes at the startup of the process is not necessarily a sign of something really bad going on.

8. (Leading question :)) Is the fact that you have mscorwks.dll loaded in the process in 2.0 an indication of that you are running the workstation version of the GC

Ok, that was probably not one of the best questions :) As pretty much all of you figured out, both workstation and server now live in one single dll called mscorwks. You can check out !eeversion to see which one you are running and in the server case, with how many heaps

9. Can you manually switch GC modes for a process? If so, how and under what circumstances

Surprisingly, a lot of people talked about gcserver enabled=true, and then answered no to this question :) For the correct answer, check dal’s response

concurrent

<configuration>
 <runtime>
   <gcConcurrent enabled="true" />
 </runtime>
</configuration>

server

<configuration>
 <runtime>
   <gcServer enabled="true" />
 </runtime>
</configuration>

The restrictions here are

  1. you can not run the server version on a single proc box, it will default to workstation
  2. you can not run concurrent while also running server
  3. if the runtime is hosted, the hosts GC mode will override the configuration

10. Name at least 2 ways to make objects survive GC collections unnecessarily

There are plenty of ways to do this and a lot of you had good answers on this one. To mention two… create an unnecessary finalize method and write code that causes objects to have a mid-life crisis i.e. for example create a function that sets up a lot of objects and then go on to calling a long running operation (database request or web service call), which causes the objects to be rooted by the thread during the whole long running operation, giving the process a good chance to perform a GC in the meantime.

In the first case (finalizer), dispose the objects when you are done. In the second case (mid-life crisis), set the objects to null if you are not planning on using them anymore so that the GC knows that they are ready for cleanup.

11. Can a .NET application have a real memory leak? In the C++ sense where we allocate a chunk of memory and throw away the handle/pointer to it

Again there were a lot of good answers in the comments. Although you can’t leak a .net object in the classic sense of the word, i.e. create an object and throw away the pointer, unless you are in unsafe mode, you can do plenty of things to create memory leaks.

Here are some examples:

  1. leaking dynamic assemblies, like in the xmlserializer case study
  2. blocking the finalizer thread - this is a bit borderline for a real memory leak, but it certainly causes ever increasing memory usage
  3. calling native code that is leaking

Btw, there is also plenty of ways to create high and increasing memory usage in a .NET apps by rooting objects without realizing that you are rooting them.

12. Why is it important to close database connections and dispose of objects? Doesn’t the GC take care of that for me

I think pretty much all of you got this one :) To paraphrase Arnaud,

The finalizer will eventually be called, after the object has been made available for garbage collection. Knowing that there may be quite some time until an object gets GC’ed, and that many resources are limited, you call Dispose yourself as soon as you’re over with an object. It doesn’t get GC’d when you call Dispose, but it releases its resources.

And of course, you avoid dragging it through the Finalizer thread.

Oh, btw, if you enjoy this kind of thing, and you live in the Seattle area, you may just want to check out Maoni’s blog, I hear they have a job opening in the GC team, although I’m sure the interview questions there will be a bit harder than this quiz :)

Laters y’all Tess