ASP.NET Case Study: Bad perf, high memory usage and high CPU in GC - Death By ViewState
I get enough issues relating to bad perf caused by large views tate that I felt like it is time to dedicate some blog space to it, so here is a typical problem description for this type of problem.
Under high load the ASP.NET application will completely go down and stop serving pages, at the same time we can see in perfmon that the ASP.NET process is hogging the CPU and that memory (
process\private bytes and
.NET CLR Memory\#bytes in all heaps) also seems to keep rising for the ASP.NET process.
With issues that are both memory and CPU related, my first inclination is to look at
.NET CLR Memory\% Time in GC and
.NET CLR Memory\Allocated Bytes/sec to see if the issue is that we have a very high allocation rate leading to high CPU in GC. Additionally the ratio between Gen 0, Gen 1 and Gen 2 Collections is interesting as well as
.NET CLR Memory\Large Object Heap Size, since most of these issues tend to relate around allocating a lot of memory on the large object heap, incurring a lot of full Garbage Collections.
See my previous post on High CPU in GC for more details.
If we do that in this case we will see that indeed allocation rates are very high, 6-10 MB/sec and the ratio between Gen 2 and Gen 1 collections are pretty much 1:1.
Debugging the issue
We can take a memory dump with
adplus -hang w3wp.exe when the CPU is really high, so I started with that and this is what I found.
The first thing I’m looking for when debugging these is what active .NET threads we have (using
~* e !clrstack)
OS Thread Id: 0x8b8 (13) ESP EIP 0219f298 7d61c824 [HelperMethodFrame: 0219f298] 0219f304 66008731 System.Web.HttpUtility+UrlDecoder.AddByte(Byte) 0219f310 660081ee System.Web.HttpUtility.UrlDecodeStringFromBytesInternal(Byte, Int32, Int32, System.Text.Encoding) 0219f344 660077a7 System.Web.HttpUtility.UrlDecode(Byte, Int32, Int32, System.Text.Encoding) 0219f358 6600a08a System.Web.HttpValueCollection.FillFromEncodedBytes(Byte, System.Text.Encoding) 0219f37c 65ff83df System.Web.HttpRequest.FillInFormCollection() 0219f3b4 65ffb226 System.Web.HttpRequest.get_Form() 0219f3c0 65ffb2a6 System.Web.HttpRequest.get_HasForm() 0219f3c8 66326b52 System.Web.UI.Page.GetCollectionBasedOnMethod(Boolean) 0219f3d4 6613854d System.Web.UI.Page.DeterminePostBackMode() 0219f3f8 6613d076 System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean) 0219f5b0 6613cd41 System.Web.UI.Page.ProcessRequest(Boolean, Boolean) 0219f5e0 6613cca7 System.Web.UI.Page.ProcessRequest() 0219f618 6613cbc7 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext) 0219f620 6613cb5a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext) 0219f634 0ef00125 ASP.allthepeople_aspx.ProcessRequest(System.Web.HttpContext) 0219f638 65fe90df System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 0219f66c 65fba191 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef) 0219f6ac 65fba4bb System.Web.HttpApplication.ResumeSteps(System.Exception) 0219f6f4 65fb924d System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object) 0219f710 65fbe244 System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest) 0219f744 65fbde92 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest) 0219f750 65fbc567 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32) 0219f900 79f2d0f3 [ContextTransitionFrame: 0219f900] 0219f950 79f2d0f3 [GCFrame: 0219f950] 0219faa8 79f2d0f3 [ComMethodFrame: 0219faa8]
… and if we look at the native part of this particular thread with
kb we find:
13 Id: 1630.8b8 Suspend: 1 Teb: fff88000 Unfrozen ChildEBP RetAddr Args to Child 0219ef68 7d4d8c56 000002f4 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15 0219efd8 79e87d19 000002f4 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac 0219f01c 79e87ce2 000002f4 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199 0219f06c 79e87c98 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117 0219f07c 79f5c7cd ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17 0219f0a0 79f5c6d0 ffffffff 001a5718 0219f194 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99 0219f0c4 79f5c4a1 00000000 00000000 000cb618 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267 0219f154 79f50f95 0219f194 000cb618 00000003 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1bc 0219f174 79f54a70 0219f194 000cb618 00000003 mscorwks!SVR::gc_heap::allocate_more_space+0x2f 0219f1b0 7a0e7c4a 000cb612 00000000 00000000 mscorwks!SVR::gc_heap::allocate_large_object+0x86 0219f1cc 79e84c69 001b0b88 000cb612 00000000 mscorwks!SVR::GCHeap::Alloc+0xbc 0219f1e8 79e8e3a1 000cb612 00000000 00000000 mscorwks!Alloc+0x60 0219f23c 79e8e401 791244f4 000cb606 00000000 mscorwks!FastAllocatePrimitiveArray+0xbd *** WARNING: Unable to verify checksum for System.Web.ni.dll 0219f2fc 66008731 00000002 0b4063d0 660081ee mscorwks!JIT_NewArr1+0x148 0219f308 660081ee 0000000c 0000002f 00000001 System_Web_ni+0xe8731 0219f334 660077a7 02b27338 000cb606 000dff7f System_Web_ni+0xe81ee 0219f348 6600a08a 02b27338 000cb606 033c98ac System_Web_ni+0xe77a7 0219f370 65ff83df 02b27338 41300000 00000000 System_Web_ni+0xea08a *** WARNING: Unable to verify checksum for System.ni.dll 0219f38c 7a4a995b 00000000 033c6d34 033c98ac System_Web_ni+0xd83df 0219f3ac 65ffb226 033c6dcc 00000000 65ffb2a6 System_ni+0x6995b
So this thread is in the middle of executing a HTTP request (aspx page) and more specifically it is filling in the forms collection, and during this call it needs to make an allocation. From the native stack we can see that we are allocating a large object (0xcb612 bytes in size), and that this triggers a garbage collection.
If we examine this thread a little bit further by looking at the objects on the stack (
!dso) we will find a few interesting objects:
0:013> !dso OS Thread Id: 0x8b8 (13) ESP/REG Object Name 0219f120 0b4063d0 System.Byte 0219f1a0 0b4e6370 System.Char ... 0219f35c 033fb3e4 System.String __VIEWSTATE ... 0219f3b4 033c6dcc System.Web.HttpRequest ... 0219f3c8 033c7674 ASP.allthepeople_aspx ...
First off, we can see that the page we are executing on this thread is called AllThePeople.aspx and that the forms element we are looking at is __VIEWSTATE.
If we examine the HttpRequest we will notice that the _contentLength is 917375 indicating that it is a pretty large page, and we can see that we don’t have a query string (the reason I still show the query string is that is good to have this data for repro purposes).
0:013> !do 033c6dcc Name: System.Web.HttpRequest MethodTable: 6637e1c0 EEClass: 6637e150 Size: 172(0xac) bytes (C:\WINDOWS\assembly\GAC_32\System.Web\184.108.40.206__b03f5f7f11d50a3a\System.Web.dll) Fields: MT Field Offset Type VT Attr Value Name ... 790fa4c8 4000fe9 28 System.String 0 instance 00000000 _queryStringText ... 790fee08 4000fee 9c System.Int32 0 instance 917375 _contentLength ...
And if we dump out the byte array at the top of the stack we find
0:013> dc 0b4063d0 0b4063d0 791244f4 000dff7f 49565f5f 54535745 .D.y....__VIEWST 0b4063e0 3d455441 77463225 77445045 6a4d4b55 ATE=%2FwEPDwUKMj 0b4063f0 7a4e7841 544f3041 674d3363 67466b39 AxNzA0OTc3Mg9kFg 0b406400 77414349 67466b39 51414349 78445138 ICAw9kFgICAQ8QDx 0b406410 67484359 55496674 47646852 33624346 YCHgtfIURhdGFCb3 0b406420 475a7556 42456b64 6a544f57 5751444a VuZGdkEBWOTjJDQW 0b406430 575a6738 325a7356 53597335 32627042 8gZWVsZ25sYSBpb2 0b406440 6d62346b 44493074 434e7052 79617042 k4bmt0IDRpNCBpay
0:013> d 0b406450 47627235 6d525631 575a6868 3263504e 5rbG1VRmhhZWNPc2 0b406460 57647378 45627146 57614646 44496835 xsdWFqbEFFaW5hID 0b406470 474e4d4a 57597468 6b616839 444c4268 JMNGhtYW9hakhBLD 0b406480 434e6749 47497442 58643531 47534d64 IgNCBtIG15dXdMSG 0b406490 32617064 57646874 6a635639 54597a6c dpa2thdW9VcjlzYT 0b4064a0 56616741 3252754a 304d6f64 79596846 AgaVJuR2doM0FhYy 0b4064b0 54596842 3263704a 30636e6c 30593068 BhYTJpc2lnc0h0Y0 0b4064c0 48497378 47536c4d 57517678 7a536730 xsIHMlSGxvQW0gSz
So it looks to be a byte array containing view state.
Since we are dealing with the large object heap, the next logical step is to look at the contents of the large object heap (remembering that we are in a GC so we might not see the complete heap) to see what we have there.
0:016> !dumpheap -min 85000 The garbage collector data structures are not in a valid state for traversal. It is either in the "plan phase," where objects are being moved around, or we are at the initialization or shutdown of the gc heap. Commands related to displaying, finding or traversing objects as well as gc heap segments may not work properly. !dumpheap and !verifyheap may incorrectly complain of heap consistency errors. ------------------------------ Heap 0 Address MT Size 02cb4a2c 001a4738 109,096 1 Free 0ab2ce08 791244f4 1,048,592 3 System.Byte 0ac2de18 001a4738 258,192 3 Free 0ac6fec8 791244f4 917,336 3 System.Byte 0ad4fe30 7912474c 1,666,080 3 System.Char 0aee6a50 791244f4 833,048 3 System.Byte 0afb2078 790fa4c8 1,666,072 3 System.String /wEPDwUKMjAxNzA0OTc3Mg9kFgICAw9kFgICAQ8QDxYCHgtfIURhdGFCb3VuZGdkEBWOTjJDQW 0b148ca0 79124304 524,304 3 System.Object 0b1c8cc0 79124304 524,304 3 System.Object 0b248ce0 790fa4c8 1,666,072 3 System.String /wEPDwUKMjAxNzA0OTc3Mg9kFgICAw9kFgICAQ8QDxYCHgtfIURhdGFCb3VuZGdkEBWOTjJDQW 0b3df908 790fa4c8 158,392 3 System.String /wEWk04Cv62+jwkC+5ieywgClJiipggCgem0hgwCh4zztQEC0vP73QoCvpCj1AQCh7rr9wQC6e+ 0b4063d0 791244f4 917,392 3 System.Byte 0b4e6370 7912474c 1,666,072 3 System.Char total 13 objects ------------------------------ Heap 1 Address MT Size 06c67d80 001a4738 119,976 1 Free 06c9c130 001a4738 111,820 1 Free 06d30f4c 001a4738 131,104 1 Free 07013420 001a4738 2,232,128 1 Free total 4 objects ------------------------------ total 17 objects Statistics: MT Count TotalSize Class Name 79124304 2 1,048,608 System.Object 001a4738 6 2,962,316 Free 7912474c 2 3,332,152 System.Char 790fa4c8 3 3,490,536 System.String 791244f4 4 3,716,368 System.Byte Total 17 objects, Total size: 14,549,980
These strange looking strings are Base64 encoded view state, so from this we can see that this application has some pages with view state around 1.6 MB. If we were to look at the contents of the byte arrays and char arrays, we would see that these too contain the same type of view state strings:
0:016> dc 0ad4fe30 0ad4fe30 7912474c 000cb608 0077002f 00500045 LG.y..../.w.E.P. 0ad4fe40 00770044 004b0055 006a004d 00780041 D.w.U.K.M.j.A.x. 0ad4fe50 007a004e 00300041 0054004f 00330063 N.z.A.0.O.T.c.3. 0ad4fe60 0067004d 006b0039 00670046 00430049 M.g.9.k.F.g.I.C. 0ad4fe70 00770041 006b0039 00670046 00430049 A.w.9.k.F.g.I.C. 0ad4fe80 00510041 00510038 00780044 00430059 A.Q.8.Q.D.x.Y.C. 0ad4fe90 00670048 00660074 00550049 00680052 H.g.t.f.I.U.R.h. 0ad4fea0 00470064 00430046 00330062 00750056 d.G.F.C.b.3.V.u.
0:016> d 0ad4feb0 0047005a 006b0064 00420045 004f0057 Z.G.d.k.E.B.W.O. 0ad4fec0 006a0054 0044004a 00570051 00670038 T.j.J.D.Q.W.8.g. 0ad4fed0 0057005a 00730056 0032005a 00730035 Z.W.V.s.Z.2.5.s. 0ad4fee0 00530059 00700042 00320062 0034006b Y.S.B.p.b.2.k.4. 0ad4fef0 006d0062 00300074 00440049 00700052 b.m.t.0.I.D.R.p. 0ad4ff00 0043004e 00700042 00790061 00720035 N.C.B.p.a.y.5.r. 0ad4ff10 00470062 00560031 006d0052 00680068 b.G.1.V.R.m.h.h. 0ad4ff20 0057005a 0050004e 00320063 00730078 Z.W.N.P.c.2.x.s.
How do I know they are view state? Well, Base64 encoded strings usually start with dDW followed by a long string of gobbly gook, similarly in 2.0 it starts with /wE, and typically when you see these on the LOH it is view state.
Taking a step back to look at the facts
I think most people reading this blog know what view state is and what it is used for. If you don’t, have a look at Dave Reed’s post on truly understanding view state.
For the purposes of this article, there are a few things you need to know
- View state tracks the state of a web control
- View state is stored as a string in a hidden form field on the page (Base64 encoded) and is serialized and deserialized on each request
The known truths that we have in this case, from the information above are:
- We have high CPU
- We have increasing memory usage during the time we have high CPU
- Our % Time in GC is high
- The ratio between Gen 1 collections and Gen 2 collections is basically 1:1
- On at least one occasion during the time we had high CPU a full garbage collection was triggered by decoding large view state (on AllThePeople.aspx)
- Some pages in the application have very large view state (1.6 MB)
- To decode view state we need to allocate memory for a string and some buffers containing the encoded view state, a string and some buffers with decoded view state and also a hashtable containing the key/value pairs.
Before I continue, I just want to mention that just because one dump showed that a full collection occurred because we were decoding the view state for AllThePeople.aspx, this does not mean that this is necessarily the culprit, but when it comes to looking at dumps of high CPU situations we will have to make some assumptions and move further on those. What is true though is that it is bad to have large view state so whether this is the source of the high CPU or not, some work should definitely be done to reduce it. I will come to the reasons for this in a little bit.
So, if we look at the facts and make some assumptions to create a good plan for further troubleshooting we can make the following assumptions:
- The high CPU in GC is caused by excessive use of the large object heap. This is almost proven by the fact that we have a 1:1 ratio between Gen 2 and Gen 1 collections, since you will do a full collection (LOH, Gen 2, Gen 1 and Gen 0) anytime you need more space on the large object heap. If the reverse was true, i.e. that it was not excessive use of the large object heap we would have to do some Gen 1 collections to move items into Gen 2 before there would be a need for a Gen 2 collection, so the ratio would be something like 1:3 or 1:5. (Keep in mind from previous posts that an ideal ratio is 1:10).
- The excessive usage of the LOH is caused by large view state. Again, this is almost completely proven by the fact that the objects that we have found on the Large Object Heap are pretty much exclusively related to view state, and that we caught a thread “red-handed” decoding large view state.
- The AllThePeople.aspx page likely contains some type of large grid, listbox or dropdown list, showing a large amount of data. For a page to have a massive amount of view state it either has to contain data bound controls, for example a dropdown list with a large amount of rows, or a grid with a large amount of rows shown per data grid page, or it can contain custom controls that store large amounts of custom view state.
If we stop there and take a look at the code for the AppThePeople.aspx page, it looks like this:
<form id="form1" runat="server"> <div> Select User: <asp:DropDownList ID="DropDownList1" runat="server" DataSourceID="SqlDataSource1" DataTextField="FName" DataValueField="ID" OnSelectedIndexChanged="DropDownList1_SelectedIndexChanged"></asp:DropDownList> <asp:SqlDataSource ID="SqlDataSource1" runat="server" ConnectionString="<%$ ConnectionStrings:BigDBConnectionString %>" ProviderName="<%$ ConnectionStrings:BigDBConnectionString1.ProviderName %>" SelectCommand="SELECT [ID], [FName], [LName], [Address] FROM [Person]"></asp:SqlDataSource> <br /> First Name: <asp:TextBox ID="txtFName" runat="server"></asp:TextBox><br /> Last Name:<asp:TextBox ID="txtLName" runat="server"></asp:TextBox><br /> Address: <asp:TextBox ID="txtAddress" runat="server"></asp:TextBox> <asp:Button ID="Button1" runat="server" OnClick="Button1_Click" Text="Button" /> </div> </form>
So basically we have a dropdown list where the contents are
SELECT ID, FName, LName, Address FROM Person, and the Person table contains 10.000 rows. That will sure do the trick :)
Of course, our work doesn’t stop there. We would have to look for any additional pages that may have the same issue, and of course also look for ways to resolve this particular issue.
Before we do, just let me mention that I was not able to run a stress test against this, because not only does this cause the server to go hey wire with the GC, but my client couldn’t handle sending that many concurrent requests since the data sent back and forth in the view state field caused my client to go nuts as well… just some food for thought when you think about how long time the transfer of this data takes for a client, especially if they are on a slow connection.
This brings up an other little detail…
If you look at exceptions in dumps where you have large view state you will often find the following type of exception
_ASP.AllThePeople_aspx === System.Web.HttpException: Invalid_Viewstate_Client_Disconnected Client IP: 220.127.116.11 Port: 62164 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0....) ViewState: dDwtNjY5ODIyMjI4O3Q8cDxsPFNUQVJUREFURTtFTkREQVRFO1BMVVN
described in this KB article.
This is not actually an exception that a client will see in the browser, but it occurs because a client got tired of waiting for the full page to load (probably because of the time it took to download the view state) so they disconnected before the page finished loading. That in itself should be an indication that you need to consider reducing the amount of view state on the page.
So what can you do to reduce the view state? A good place to start is by understanding what caused it, so for example in this case, taking a look at the AllThePeople.aspx page and running it with no query string (or whatever query string we found earlier during debugging), viewing the source of the page to see how much view state we have (in the __VIEWSTATE form field), and understanding why we bring back that much data.
I would also look at some of the other pages in the application that may have controls bound to data sources that may contain a lot of data.
The next step would be to disable view state if it is possible, and also allowing the user to filter the data in some way to reduce the amount of data being retrieved.
Let’s face it, no user is ever going to drill through 10.000 items in the dropdown list, so reducing the amount of data doesn’t only reduce the amount of view state and save the users bandwidth, it also increases the usability of the page greatly. If they need all the data, they should get it in chunks (i.e. paged data grid for example), but most of the time filtering is the best option.
I know I might be a bit picky, but usually, if a page on the internet takes more than a few seconds to load, I’m on to the next page. If it is a web site for internal use where that much data is actually needed, perhaps a thick client would be a better option.
Now, having said that, it might be ok for a single page that is used very infrequently to use that much view state, for example some report that someone only needs to run every 3rd monday of the month if the moon is blue, but for commonly used pages like default.aspx some special care is crucial.
Have a weekend y’all…