.NET Debugging Demos Lab 4: High CPU hang

February 22, 2008 5 minute read

So we finished the first round of performance issues, crashes and memory leaks.

This time we are going to dive in to a high CPU situation and I know that this is giving it away a little but before you go through the lab you might want to have a look at my GC Pop Quiz to familiarize yourself with how the GC works if you feel a bit shaky about it.

Previous labs and setup instructions

If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.

Reproduce the issue

Start the application and browse to All Products, this should take about 5 seconds or more depending on what machine you’re on
Watch the CPU usage in task manager while browsing to this page
- What is the CPU usage like for iisexpress.exe during the processing of this request?

Watch the .net performance counters

Start monitoring the performance counters for the iisexpress.exe process

 dotnet-counters monitor -n iisexpress --refresh-interval 1 --counters System.Runtime

Prepare to gather a memory dump

Prepare to capture a memory dump with procdump or dotnet-dump but don’t hit enter
```
 procdump -ma iisexpress.exe
```
or
```
 dotnet-dump collect -n iisexpress
```

Reproduce the issue again and gather the memory dump

Note: Watch the performance counters - specifically the collections - while reproducing

Generate load with .\ToolsAndScripts\tinyget.ps1 -url https://localhost:44350/Products -numTimes 5
- How high does the CPU Usage (%) get?
- What is the ratio between Gen 0, Gen 1, and Gen 2 GC Count? What should it be in an optimal scenario and why?
- What can cause a ratio like this?
Reproduce the issue again
When the CPU usage is high, press enter in the procdump/dotnet-dump window to get the dump - alternatively you can also right-click in task manager to generate a memory dump.

Open the dump to figure out what it is doing

Open the dump in windbg
Load up the symbols and sos

Verify that you took the dump at the right time

Run !threadpool to see the CPU usage of the system to make sure that you got the dump at the right time
- What is the CPU Utilization?
- How does the CPU Utilization shown here correlate to the CPU usage of the process?

Determine which threads have been consuming most of the CPU time

Run .time to see the uptime and CPU User time of the process
Run !runaway to see the User Mode Time for all the threads
- From !runaway which threads are consuming most of the CPU?
- How much user mode CPU time was consumed by the process in total? (from .time)
Note: Looking at !runaway output can be a bit tricky for multiple reasons.
- First off, on a multi processor machine you have to remember that the user mode time (i.e. clock cycles spent in user mode code) is CPU time spent on all processors, therefore the user mode time may add up to more than elapsed time (process uptime).
- Secondly, the !runaway output shows the total user mode time spent on that thread since it was started. In asp.net for example a thread may be reused for multiple requests so a high user mode time on a thread does not necessarily mean that the requests running on that thread is a big CPU consumer.
- And finally, some threads like the GC threads (in multi processor, serverGC processes) stay alive for the duration of the process, so they have a higher chance of accumulating a lot of user mode time than worker threads, therefore it is really more interesting to look at two consecutive dumps and compare the differences in user mode time.
Pick the threads that have the highest user mode time and look at their stacks
```
 ~#s         # set thread context, pick a thread number from !runaway
 kb 2000     # look at the native stack
 !clrstack   # look at the .net stack
```
- What are they doing? Can you make an assumption based on this as to what is causing the high CPU usage?
[Only on multi-proc machines since they have dedicated GC threads]. Sum up the user mode time for the GC threads in !runaway and divide this by total user mode time from .time.
- How much of the total user mode time is spent by the GC threads?

Look at the memory dump to figure out what caused the high CPU in GC

Run ~* kb 2000 to get all native stacks and search for the thread that triggered the GC (coreclr!SVR::GCHeap::GarbageCollectGeneration)
- Why was a GC triggered?
- What type of object was it allocating?
- What is the thread doing?
- Could this be causing the bad Gen 0, Gen 1, Gen 2 ratio? Why?
Find out what is on the large object heap

Note: if you unfortunate you may have caught the GC in the plan or relocate phase in which case the !dumpheap output may not be correct. If that is the case try to make do with what you get from the stacks and !dumpheap -min 85000)
```
 !dumpheap -min 0n85000
```
- What is on the large object heap?
- Is there any pattern in the sizes of the strings?
Dump out a few of the strings to see the contents
```
 !do <address of string>
```
- Based on the stacks, the knowledge of the collection ratios and the string contents, can you determine what went wrong. You might want to do some internet searches on some of the calls on the stacks if you don’t know the answer immediately.

Verify your assumption in the code and modify the code to resolve the issue

Look at the code for the function you identified
- What line of code is causing the problem?
- How do you resolve it? Why does this resolve the problem?
Fix the problem and rerun the tests to see the results.

Note: Because there is a lot of looping to build the dataset the page may still take about half a second to a second to render but the CPU usage should be all gone and requests should not be queueing up.

Have fun, Tess

Twitter Facebook LinkedIn

Tess Ferrandez

.NET Debugging Demos Lab 4: High CPU hang

Previous labs and setup instructions

Reproduce the issue

Watch the .net performance counters

Prepare to gather a memory dump

Reproduce the issue again and gather the memory dump

Open the dump to figure out what it is doing

Verify that you took the dump at the right time

Determine which threads have been consuming most of the CPU time

Look at the memory dump to figure out what caused the high CPU in GC

Verify your assumption in the code and modify the code to resolve the issue

You May Also Enjoy

Naming tests

Organizing ASP.NET Core Minimal APIs

Debugging .NET Core memory issues (on Linux) with dotnet dump

Reviving the blog and updating the Buggy Bits Debugging Labs