I was looking at a memory dump recently for an issue where the process would grow to over 1 GB and return OutOfMemory Exceptions.
Debugging the issue
A full user dump of the w3wp.exe process was taken with debug diag 1.1 when memory was really high.
If you are reading this and have no clue what debug diag is, how to get dumps with it, or use leak tracking, here is the download and the usage instructions.
The dump itself was 1.34 GB which is basically the memory we have committed (used) so it was pretty big, and the first thing we need to check for in this case is whether we are dealing with an issue with high .NET GC memory usage (lots of .net objects), .NET loader heap usage (lots of dlls) or native memory usage (native heaps or dlls)
I opened the dump in windbg and loaded sos (
.loadby sos mscorwks) to take a peak at the managed memory
0:000> !eeheap -gc Number of GC Heaps: 8 ------------------------------ ... ------------------------------ GC Heap Size 0xffd12a8(268,243,624)
!eeheap –gc showed that the size of the .net GC heaps was only 268 MB so at first sight it didn’t really look like a .NET leak, which left us with a loader heap issue or a native leak.
0:000> !address -summary -------------------- Usage SUMMARY -------------------------- TotSize ( KB) Pct(Tots) Pct(Busy) Usage 3521d000 ( 870516) : 41.51% 46.09% : RegionUsageIsVAD cb9d000 ( 208500) : 09.94% 00.00% : RegionUsageFree 8018000 ( 131168) : 06.25% 06.95% : RegionUsageImage 8c00000 ( 143360) : 06.84% 07.59% : RegionUsageStack 8c000 ( 560) : 00.03% 00.03% : RegionUsageTeb 2d58f000 ( 742972) : 35.43% 39.34% : RegionUsageHeap 0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap 1000 ( 4) : 00.00% 00.00% : RegionUsagePeb 1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs 1000 ( 4) : 00.00% 00.00% : RegionUsageEnvironmentBlock Tot: 7fff0000 (2097088 KB) Busy: 73453000 (1888588 KB)
!address –summary shows that most of the memory is used by RegionUsageIsVAD and RegionUsageHeap, i.e. virtual allocations and heap allocations. Now .NET memory goes into RegionUsageIsVAD but with this much RegionUsageHeap I suspected a native leak or high native memory usage. In order to figure out what was causing this high NT Heap usage I asked for a debug diag dump with leak tracking enabled.
Leak tracking in debug diag will basically track any allocations and de allocations you make, and at the end of the run you will end up with a dump containing information about what stacks allocated the memory currently in use in the dump.
In this case they had already ran debug diag with leak tracking enabled which was great, but I will show you a few gotchas with leak tracking that they had unfortunately ran into.
When you have used leak tracking, you need to run the dump through advanced analysis-Memory pressure analysis in debug diag to parse the leak tracking results. Debug diag will then give you information at the top about the biggest offenders, along with some heap usage information and sample stacks for allocations.
In this case debug diag was showing the following:
|Warning||gdiplus.dll is responsible for 35.86 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions: gdiplus!GpMemoryBitmap::AllocBitmapData+4b: 35.37 MBytes worth of outstanding allocations. gdiplus!GpMalloc+12: 496.51 KBytes worth of outstanding allocations.||If this is unexpected, please contact the vendor of this module, Microsoft Corporation, for further assistance with this issue.|
|Warning||UNKNOWN_MODULE is responsible for 30.11 MBytes worth of outstanding allocations. The following are the top 2 memory consuming functions: 0x1aab1569: 30.11 MBytes worth of outstanding allocations. </>||If this is unexpected, please contact the vendor of this module for further assistance with this issue|
Ok, so 35 MB worth of GDI+ objects and 30 MB from some unknown module. But we are dealing with a ~1.4 GB dump so this seem like a tiny drop of water in a huge ocean.
A couple of questions may appear in your head here (at least they do in mine when I see this)
- Why didn’t the report show more allocations?
- What is the GDI+ stuff?
- and what the heck is the Unknown module?
Why didn’t the report show more allocations?
If we look at the report we first have a section with machine name, operating system, process name etc. Here we also have data about the process uptime, which in this case is ~17 hrs. So this memory (1.4 GB) we have in the process was built up over 17 hours.
System Up-Time 3 day(s) 06:47:23 Process Up-Time 17:15:10
If we then scroll down to the Leak Analysis report section we find
Outstanding allocation summary Number of allocations 14,959 allocations Total outstanding handle count 180 handles Total size of allocations 88.38 MBytes Tracking duration 00:07:01
So this means that leak tracking has only been turned on for 7 minutes out of the 17 hrs that the process was up. During the time leak tracking was turned on we allocated 88.38 MB and 180 handles. Debug diag will not be able to report “leaks” that happened before leak tracking was turned on, so these 35 MB for GDI+ should be compared to the 88 MB of total allocations.
In reality, given that leak tracking was turned on for such a short amount of time we can’t really affirmatively say much at all about the results here since the size of the allocations are so small compared to the size of the process. We would need to rerun debug diag leak tracking, preferably from the start of the process to say anything for sure.
But, if we ignore this and extrapolate freely, assuming that the allocation pattern is uniform, we can assume that a lot of the memory in the dump is GDI+ items, and a lot comes from this “Unknown module”. I wouldn’t recommend this as proof, but it’s worth a try spinning on this thread to see if we can get any more info, until we get a new memory dump with leak tracking turned on for a longer time.
What is the GDI+ stuff?
Well, looking at the message debug diag gave us, it seems like most of the GDI+ objects come from this method: gdiplus!GpMemoryBitmap::AllocBitmapData. The name implies that we are creating a bitmap, which can either be done explicitly from some native component, or indirectly by creating a System.Drawing.Bitmap.
Using System.Drawing objects from a service is unsupported because of the nature of such applications (i.e. running under the context of a user context that might not be logged, and the potential for desktop heap exhaustion)
Caution: Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions.
But… I am not going to harp on the unsuitability of System.Drawing objects in a server app. What I would say though, is that if you choose to go this unsupported route, you need to make extremely sure that you got your ducks in a row and dispose of your System.Drawing objects appropriately to avoid handle leaks, desktop heap exhaustion etc.
Going back to the dump in windbg, we can take a look at the heap to see what types and how many System.Drawing objects we have here
0:000> !dumpheap -type System.Drawing -stat Statistics: MT Count TotalSize Class Name 7ae7a658 1 12 System.Drawing.SR 7ae73bf0 1 16 System.Drawing.SizeF 7ae73ae8 1 16 System.Drawing.Size 7ae73cf4 1 24 System.Drawing.Rectangle 7ae74194 4 48 System.Drawing.Printing.PrintingPermission 7ae73e6c 3 48 System.Drawing.Point 7ae7a28c 4 64 System.Drawing.ToolboxBitmapAttribute 7ae84314 13 208 System.Drawing.TextureBrush 7ae78784 12 288 System.Drawing.Imaging.ImageFormat 7ae7580c 22 704 System.Drawing.Graphics 7ae778e8 27 972 System.Drawing.Pen 7ae78308 299 5980 System.Drawing.Internal.GPStream 7ae78190 356 8544 System.Drawing.Bitmap 7ae75d7c 389 9336 System.Drawing.Color 7ae7747c 593 21348 System.Drawing.SolidBrush 7ae75684 1424 28480 System.Drawing.FontFamily 7ae742b0 1424 56960 System.Drawing.Font Total 4574 objects
Among other things we can see that we have 356 Bitmaps, so since those are still in memory, unless they are disposed they will still hold on to a lot of native GDI+ mem. Well how much will of course depend on how big the respective bitmaps are.
We’ll get back to why these are sticking around in a moment.
What the heck is the Unknown module?
If you see unknown modules in debug diag, it means that debug diag couldn’t resolve the symbols for the module. This often happens if it is a .net method.
Clicking on the unknown module in the report header takes us to some sample stacks like this one. So some kind of data related stuff.
0x1AAB1569 System.Data.Common.DataAdapter.FillLoadDataRow(System.Data.ProviderBase.SchemaMapping) System.Data.Common.DataAdapter.FillFromReader(System.Data.DataSet, System.Data.DataTable, System.String, System.Data.ProviderBase.DataReaderContainer, Int32, Int32, System.Data.DataColumn, System.Object) System.Data.Common.DataAdapter.Fill(System.Data.DataSet, System.String, System.Data.IDataReader, Int32, Int32) System.Data.Common.DbDataAdapter.FillInternal(System.Data.DataSet, System.Data.DataTable, Int32, Int32, System.String, System.Data.IDbCommand, System.Data.CommandBehavior) System.Data.Common.DbDataAdapter.Fill(System.Data.DataSet)
Let’s go back to windbg to figure out why these bitmaps are still around
0:000> !dumpheap -type System.Drawing.Bitmap ------------------------------ Heap 0 Address MT Size 00e4d510 7ae78190 24 00e4d9a4 7ae78190 24 00e7d4a8 7ae78190 24 0106d20c 7ae78190 24 01074bf0 7ae78190 24 01074d1c 7ae78190 24 0108e08c 7ae78190 24 0108f3b4 7ae78190 24 0109185c 7ae78190 24 01091874 7ae78190 24 ...
0:000> !gcroot 00e4d510 ... Finalizer queue:Root:00e4d510(System.Drawing.Bitmap)
The bitmap that I happened to pick and do
!gcroot on (to figure out why it was not collected), was rooted in the finalize queue, meaning that it is ready for finalization but waiting for the finalizer to finalize it.
The other bitmaps follow the exact same pattern, so the next step is to figure out what the finalizer is doing and figure out what is stopping it from finalizing these bitmaps.
0:000> !threads ThreadCount: 279 UnstartedThread: 0 BackgroundThread: 118 PendingThread: 0 DeadThread: 161 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 1 1 190c 0015b598 8808220 Enabled 00000000:00000000 0014b380 0 Ukn (Threadpool Completion Port) 14 2 257c 0016c6d8 b220 Enabled 00000000:00000000 0019c920 0 MTA (Finalizer) ...
0:000> ~14s eax=00166878 ebx=00000000 ecx=00167208 edx=00000000 esi=40000ee0 edi=00000000 eip=77f88f13 esp=19a0f12c ebp=19a0f19c iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202 NTDLL!NtWaitForSingleObject+0xb: 77f88f13 c20c00 ret 0Ch
0:014> !clrstack OS Thread Id: 0x257c (14) ESP EIP 19a0f444 77f88f13 [InlinedCallFrame: 19a0f444] System.Data.Common.UnsafeNativeMethods.OCIHandleFree(IntPtr, HTYPE) 19a0f440 6589a74c System.Data.OracleClient.TracedNativeMethods.OCIHandleFree(IntPtr, HTYPE) 19a0f480 658717b7 System.Data.OracleClient.OciHandle.Release() 19a0f4b4 6587183d System.Data.OracleClient.OciHandle.ReleaseHandle() 19a0f6c8 79e88f63 [GCFrame: 19a0f6c8] 19a0f8c0 79e88f63 [GCFrame: 19a0f8c0] 19a0f92c 79e88f63 [HelperMethodFrame_1OBJ: 19a0f92c] System.Runtime.InteropServices.SafeHandle.InternalFinalize() 19a0f984 7934dc52 System.Runtime.InteropServices.SafeHandle.Dispose(Boolean) 19a0f988 7934db81 System.Runtime.InteropServices.SafeHandle.Finalize() 19a0fc38 79f7b546 [ContextTransitionFrame: 19a0fc38] 19a0fd14 79f7b546 [GCFrame: 19a0fd14]
0:014> kL ChildEBP RetAddr 19a0f128 77f8f295 NTDLL!NtWaitForSingleObject+0xb 19a0f19c 77f87f26 NTDLL!RtlpWaitForCriticalSection+0x9e 19a0f1a4 6111e091 NTDLL!RtlEnterCriticalSection+0x46 WARNING: Stack unwind information not available. Following frames may be wrong. 19a0f1b4 6067819d ORACORE9!sltsmna+0x11 19a0f3fc 60677e82 oraclient9!kpufhndl0+0x30d 19a0f410 6060121a oraclient9!kpufhndl+0x12 19a0f420 25f9c85f oraclient9!OCIHandleFree+0x1a 19a0f430 6589a74c oci!OCICharSetToUnicode+0xaaae 19a0f478 658717b7 System_Data_OracleClient_ni!_load_config_used+0x39554 19a0f4ac 6587183d System_Data_OracleClient_ni!_load_config_used+0x105bf 19a0f4c0 79e88ee4 System_Data_OracleClient_ni!_load_config_used+0x10645 19a0f540 79e88e31 mscorwks!CallDescrWorkerWithHandler+0xa3 19a0f678 79e88d19 mscorwks!MethodDesc::CallDescr+0x19c 19a0f690 79eeea68 mscorwks!MethodDesc::CallTargetWorker+0x20 19a0f6b0 79f01004 mscorwks!MethodDescCallSite::Call_RetBool+0x1c 19a0f7b0 79f00f8a mscorwks!SafeHandle::RunReleaseMethod+0x75 19a0f8b0 79f00eb6 mscorwks!SafeHandle::Release+0xf2 19a0f8dc 79f357b7 mscorwks!SafeHandle::Dispose+0x2b 19a0f97c 7934dc52 mscorwks!SafeHandle::Finalize+0xab 19a0f9ac 79f7b546 mscorlib_ni+0x28dc52
Although I don’t have the oracle symbols so the oracore9, oraclient9 and oci methods are incorrect, we can see that the finalizer is currently trying to finalize an OciHandle, and in order to do so it needs to enter a critical section which is currently locked.
Until it has finalized this OciHandle it can’t finalize anything else, so our Bitmaps can’t be garbage collected.
If we want to go a bit further we can find out who owns the critical section and have a look at what it’s doing.
0:014> !locks ... CritSec +40000ee0 at 40000EE0 LockCount 1 RecursionCount 1 OwningThread 1ca0 EntryCount 1 ContentionCount 1 *** Locked
!locks will give us a list of all the locks in the process and any locks that have a lock count higher than 0 are currently locked. In this case, the only one that fits that criteria is the one above, so this is the critsec that we are waiting for.
It is owned by the thread with the OSID 1ca0 so we can move to this thread to find out what it is working on.
0:014> ~~[1ca0]s eax=00000380 ebx=ffffffff ecx=29820e2c edx=00000000 esi=7fffffff edi=00000102 eip=77f88f13 esp=20b798dc ebp=20b79914 iopl=0 nv up ei ng nz ac pe cy cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000297 NTDLL!NtWaitForSingleObject+0xb: 77f88f13 c20c00 ret 0Ch
0:069> kL ChildEBP RetAddr 20b798d8 74fd1394 NTDLL!NtWaitForSingleObject+0xb 20b79914 74fd160d msafd!SockWaitForSingleObject+0x1a8 20b79990 750313fa msafd!WSPRecv+0x1e9 20b799cc 61915574 ws2_32!WSARecv+0x8a WARNING: Stack unwind information not available. Following frames may be wrong. 20b79a00 61912f74 orantcp9!nttini+0x4554 20b79a28 615050a3 orantcp9!nttini+0x1f54 20b79a94 614d68b0 oran9!ztvull+0x17263 20b7a2fc 614c12c8 oran9!nsrdr+0x140 20b7a3ac 614905ba oran9!nsdo+0xd38 20b7a3f4 60538544 oran9!nioqrc+0x3ea 20b7ad18 61493acd oracommon9!ttcdrv+0x404 20b7ad30 606ba7db oran9!nioqwa+0x2d 20b7cde0 606b06fd oraclient9!upicui2+0x75b 20b7ce14 6069e727 oraclient9!upirtr+0xbd 20b7ce50 6068266b oraclient9!kpurcs+0x97 20b7ef3c 60683b41 oraclient9!kpufGetRcvInfo+0xe2b 20b7ef94 60601cc4 oraclient9!kpufGetRcvInfo+0x2301 20b7efb4 25f9cf91 oraclient9!OCIStmtFetch+0x24 20b7f008 6589b9b3 oci!OCICharSetToUnicode+0xb1e0 20b7f028 6589b9b3 System_Data_OracleClient_ni!_load_config_used+0x3a7bb
0:069> !clrstack OS Thread Id: 0x1ca0 (69) ESP EIP 20b7f020 77f88f13 [NDirectMethodFrameStandaloneCleanup: 20b7f020] System.Data.Common.UnsafeNativeMethods.OCIStmtFetch(System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, UInt32, FETCH, MODE) 20b7f03c 6589b9b3 System.Data.OracleClient.TracedNativeMethods.OCIStmtFetch(System.Data.OracleClient.OciHandle, System.Data.OracleClient.OciHandle, Int32, FETCH, MODE) 20b7f05c 6588596d System.Data.OracleClient.OracleDataReader.ReadInternal() 20b7f0a8 658856de System.Data.OracleClient.OracleDataReader.Read() 20b7f0d8 652330ad System.Data.Common.DataAdapter.FillLoadDataRow(System.Data.ProviderBase.SchemaMapping) 20b7f110 65232e82 System.Data.Common.DataAdapter.FillFromReader(System.Data.DataSet, System.Data.DataTable, System.String, System.Data.ProviderBase.DataReaderContainer, Int32, Int32, System.Data.DataColumn, System.Object) 20b7f168 65232cf4 System.Data.Common.DataAdapter.Fill(System.Data.DataTable, System.Data.IDataReader, Int32, Int32) 20b7f1b4 6524068e System.Data.Common.DbDataAdapter.FillInternal(System.Data.DataSet, System.Data.DataTable, Int32, Int32, System.String, System.Data.IDbCommand, System.Data.CommandBehavior) 20b7f20c 652404d3 System.Data.Common.DbDataAdapter.Fill(System.Data.DataTable, Int32, Int32, System.Data.IDbCommand, System.Data.CommandBehavior) 20b7f24c 6524029c System.Data.Common.DbDataAdapter.Fill(System.Data.DataTable) ... <removing customer specific code as it is not important for the discussion>
Again, the oracle methods are incorrect in this stack since we don’t have symbols, but in essence we are getting some data from an Oracle db, waiting for it to return the data.
What do we have so far?
We know that we have high memory usage, and that there is a big potential for this high memory usage to come from bitmaps.
We also know that there are indeed some System.Drawing.Bitmaps in the process that are waiting to be finalized, which means that these bitmaps are not explicitly disposed in code after we are finished with them. If they were explicitly disposed they would not have to be finalized, and thus we would not have to wait for the finalizer in order to release the underlying GDI+ resources.
We also know that the finalizer is blocked while trying to release some Oracle OCIHandle and that in order to unblock the request on thread 69 has to finish.
This in turn also means that we know that the applications code is not explicitly closing Oracle connections, command objects etc. when it’s finished using them. As with the Bitmaps, if it was explicitly closing them, they would not have to be finalized on the finalizer.
In a nutshell
To answer the 1 GB question, to dispose or not to dispose? If you create objects that have pointers to native resources, like connections, bitmaps, files etc. you need to dispose them explicitly when you are done with them, if you don’t want to end up in a situation like this one.
MSDN will usually tell you if you need to dispose or not, like in the case of the bitmap dispose method (derived from Image dispose)
Note: Always call Dispose before you release your last reference to the Image. Otherwise, the resources it is using will not be freed until the garbage collector calls the Image object’s Finalize method.
Until next time, Tess