9 minute read

A few days ago I posted a question I had gotten on email

We use Page.Cache to store temporary data, but we have recently discovered that it causes high memory consumption. The bad thing is that the memory never goes down even though the cache items have expired, and we suspect a possible memory leak in its implementation.

We have created this simple page

protected void Page_Load(object sender, EventArgs e)
{
    this.Page.Cache.Add(Guid.NewGuid().ToString(), Guid.NewGuid().ToString(), null, DateTime.MaxValue, TimeSpan.FromMinutes(1), CacheItemPriority.NotRemovable, new CacheItemRemovedCallback(this.OnRemoved));
}

public void OnRemoved(string key, object value, CacheItemRemovedReason r)
{
    value = null;
}

Which we stress with ACT (Application Center Test) for 5 minutes. Memory usage peaks at 450 MB, and after some time it decreases to 253 MB, but never goes down completely even though we waited for 10 minutes after the stress test. Our expectation is that the memory should go down to about 50-60 MB.

The question is does the above scenario fall into the category of memory leaks?

I had thoroughly enjoyed the quizzes that Rico Mariani and Mike Stall had on their blogs, so I did a copy cat… and I have to say I really liked the results, both because I thought the answers were really good and contained details that I would probably have missed, and also because it brought up some new questions that I didn’t think off.

If you haven’t looked at the quiz yet, I would recommend you do, and especially the comments…

Answers/Summary

Before I start, I want to say that I am not so naive that I claim that our products are completely bug free, no software ever is… but when I get a proof and don’t agree with the results (especially when it is such a commonly used feature as Cache) I get as suspicious as Dr. House:) and start scrutinizing the test. This is partially why I brought this question up as a post to begin with… I.e. because I think it is important that in any situation where you do a stress test or a proof of concept it is very important that you know the underlying platform in order to interpret the results correctly.

As I mentioned there were a lot of good comments on the quiz, as well as many good questions so I will divide the points into different sections.

  1. The CacheItemRemovedCallback
  2. The stress test and garbage collection
  3. Sliding Expiration and Absolute Expiration
  4. CacheItemPriority.NotRemovable
  5. Page.Cache vs. Cache vs. Application
  6. Real-life scenario vs. Stress test for Proof
  7. A small comment on timers

The CacheItemRemovedCallback

The first thing I noticed when looking at the results given was that 450 MB seemed like an insane amount for this tests. Even if nothing was removed from cache, the items stored in cache (GUID’s) are relatively small and would never amount to that much, so something smells very fishy. As Matt correctly pointed out in his comment, we are running into an issue where we are connecting an instance event handler with a cache object, which effectively causes us to cache the whole page and all its contents.

To get more info about this see my earlier post about “the event handlers that made the memory balloon”. In this particular case it is not necessary to set the value to null. The GUID will be un-rooted when it is removed. If you have a situation where you do need to dispose the object that is stored in cache, you would have to use a static event handler of some kind.

Performing this minor change we can get the same stress test to peak at 50 MB instead of 450, which is a major improvement. But even still the objects are not removed from memory when the cache expires… so on to the next point…

The stress test and garbage collection

When doing a stress test like this it is very important to understand a few things. The first is how to interpret the results and the second is to understand the platform we are working with and the behavior of the garbage collector.

In this case the data that we looked at was memory usage for the process in task manager, alternatively private bytes in performance monitor. The question really doesn’t tell, but in my private tests I was looking at private bytes in performance monitor.

What I would be really interested in is

  1. are the objects removed properly from cache?
  2. does the size of the managed heaps decrease (i.e. are the .net objects stored in cache actually collected)? and
  3. what happens with the size of the process and what will happen if we run the test a second time, i.e. will it increase by the same amount or will memory be reused etc.

So I added the counter ASP.NET Apps v2.0.50727/Cache Total Entries, and saw that it increased gradually with the test, and then every so often I got a dip that created a sawtooth pattern in the Cache Total Entries, indicating that cache entries were being released and new ones came in. Then i stopped the test and waited, and after 1 minute (approx.) there was a huge dip, and then after another minute there was another huge dip, and after about 5 minutes my Cache Total Entries count was down to 0.

Conclusion #1 My objects are no longer rooted by the cache and should be available for garbage collection and removal, but… they are never collected, why oh why?.

Petros pointed out that this is because after the stress test, we had no activity, so nothing caused a GC, and thus nothing will get collected even if it is available for collection. This is the most common mistake when performing a stress test (see my post about why un-rooted objects are not garbage collected here for more info)

So, what can we do? Well, if i am trying to stress a leak, and want to verify if it really is a leak, i usually introduce a page with the following code

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

And run this after the stress test.

Caution! I wouldn’t recommend calling this in production, just to clean up memory unless you have a really good reason and can make sure that it is not called so often that it throws the GC’s own mechanism for determining generation limits etc. and so that it doesn’t cause high CPU issues. Think twice and three and four times before putting this in production. It is great for after a stress test though… to simulate the next time a full GC comes in without having to create allocations to cause it.

With this in place I could see that my .net CLR Memory/Bytes in all heaps, went down all the way to 1.5 MB after the collection, which means that all the objects I cached etc. were gone, and everything was successful, so we don’t have a leak in the Cache mechanism. Yay!!!

Conclusion #2 The objects would have been gone if a full collection would have occurred.

But, wait a minute… my private bytes didn’t go down as much as I would have thought… hmmm…

JCornwell brought up an interesting point… Even if we do collect and the bytes in all heaps are all nicely down to a minimum, we don’t necessarily de-commit all memory and return it to the OS. If we had to do this all the time performance would decrease significantly… but… it is not really a problem for us, it is just a food for thought when looking at the results. See, If I run the same test again, we will reuse this memory, so it is by no means leaked…

Conclusion #3: It is important to know how the garbage collector works and what the important counters and values are to correctly interpret a result.

Sliding Expiration and Absolute Expiration

No one touched on this, but I wanted to bring it up because it was something that caught my eye when I saw the sample. In this case we had a sliding expiration of 1 minute and an absolute expiration of DateTime.MaxValue

Ok, I have to admit, I don’t know all the method signatures and things by hand but just seeing the sample I was a bit confused about if the cache items should expire after 1 minute or after whatever insanely long amount of time DateTime.MaxValue might be. And I got even more confused when I looked it up in MSDN in order to see which took precedence, and found that I was supposed to have gotten an argument exception if i used both a sliding and absolute expiration, but I clearly didn’t get an exception…

I even got so perturbed that I had to hook up windbg (surprise, surprise) and make super sure that I didn’t get an exception and even then I didn’t trust it…

so I went to the code and found that MaxValue = Cache.NoAbsoluteExpiration… I later found out that this was documented in MSDN :) but I had more fun finding it out using reflector.

So… we were using the sliding expiration but I wanted to bring it up, since it confused me, and perhaps would confuse other people too…

CacheItemPriority.NotRemovable

This one caused a bit of discussion when Scott said he thought that it is better to not use NotRemovable and build in logic to re-populate the cache when needed. I think it is a good point, but that NotRemovable has it’s benefits too in some cases.

Just to clarify. NotRemovable means that the item will be available for collection when the cache item has expired but not before. If the cache item is removable the cache item is eligible for removal before its expiration if memory usage is high.

One specific location where it has a benefit is in ASP.NET’s session implementation. As you are probably all aware by now (from my ramblings in previous posts), InProc session state is stored in cache. The session objects are stored with a CacheItemPriority of NotRemovable since there is no way to repopulate these if they are deleted. I believe you should choose CacheItemPriority based on the cost and possibility of re-populating the cache. But do feel free to disagree with me :)

Page.Cache vs. Cache vs. Application

What I really wanted to get out of the question about the difference between Page.Cache and Cache was that there really is no difference. They are pointing to the same object. The cache is application specific (app domain specific), and in the event of an app domain recycle it is emptied out.

The Application is very similar to the Cache, in that it is a static object with a dictionary like structure. This is saved as a legacy from ASP, and I have yet to find a reason to use it instead of just using Cache.

Real-life scenario vs. Stress test for repro

A few people mentioned that the test didn’t seem realistic. I agree, but I also don’t think the intent of the sample above was to be realistic, rather I think the person who wrote the email wrote the sample this way to quickly and easily determine if there was a memory leak, since it is a lot faster and cleaner than trying to repro with the full application.

A small comment on timers

Finally, I just want to comment on a timers issue that I have mentioned before.

If you run on 1.1. and you see that your cache items aren’t expiring (Cache Total Entries) you may be running into a problem where timers are not firing properly. But that is not the case in my stress test.

Laters y’all.