Windbg.exe and its friends can be installed from here
Once you have them installed on a machine, you can simply copy the directory where they are installed (usually
c:\program files\debugging tools for windows) to any machine that you need them on. No other installation is really necessary.
Before we even start with how you get the dumps, you might be interested in what a memory dump actually is…
A memory dump is a snapshot of a process or a system at a given time. There are various types of memory dumps with varying degree of data included.
User vs. Kernel Dumps
If you take a memory dump of a process, you have taken a user dump. If you need a memory dump of a whole system, you take a kernel dump. I’m going to skip the discussion on kernel dumps completely because for the problems I deal with (hung, crashing processes or processes with memory leaks and exceptions), I don’t generally need to know what the operating system is doing at the time, so a kernel dump is a waste of space and .net debugging in kernel dumps is almost impossible.
Degrees of data included
Usually dumps are referred to either mini dumps or full dumps even though this notation is not really correct, since what we refer to as full dumps are really mini dumps with extra information.
Either way, a full dump is usually a dump taken with the /ma (a for all) switch, which means full memory data, handle data, unloaded module information, basic memory information, module information, thread and stack information including thread time information. In essence, you get everything you can want and more in one file. The size of a full dump is the same as the private bytes used by the process.
A mini dump on the other hand is usually a dump taken with the /mdi switch which means module, thread, stack, any memory that is referenced by a pointer on a stack, and some read-write segments. Mostly these are used to look at what threads are executing at the time the dump was taken. The size of a mini dump is usually only a few MB, so its benefit is that its very fast to write and of course doesn’t take up much space, but you cant get much .net data from it.
The switches I am talking about are switches to the .dump command in windbg.exe, to learn more about the different options, look up .dump in the windbg help files.
As a side note: When you run an application and get an application failure where you are asked if you want to send the data to Microsoft, what you normally send is a mini dump, so no real personal data is sent. I know many people avoid sending this data, fearing that big brother will now know everything about you, if you are one of these people, fear no longer:) it’s a good thing to send this data; you are helping eliminating bugs so you won’t have to run into them later. And the people and applications that look at these dumps don’t care who you are or about anything personal, they are just looking to solve the problems.
How do you get the dumps
There are a few tools that you can use to generate dumps. Some of you might have heard about debugdiag or error reporting or dr. Watson. But my tool of choice is windbg or a script file that comes with it called adplus.vbs which basically scripts windbg’s command line equivalent cdb.exe.
If you are attached to a process with windbg.exe you can create a memory dump by typing
.dump /ma c:\mydumpfolder\mydump.dmp at windbg’s command line.
What is more common, when we ask for dumps from customers, is that we automatically create dumps with adplus since it’s much easier and nicer to deal with.
Adplus takes a number of arguments, I’m not going to bore you with all of them, but just show you some of the more common ones.
-hang creates a snapshot (full dump) of the process right now, it really has nothing to do with the process hanging, but it got its name because the most common usage for these snapshots is to debug hangs.
When the problem occurs you simply run
adplus -hang -pn processname.exe
then cdb.exe attaches in non-invasive mode (so it can get out later without shutting down the process), takes a snapshot and leaves.
-crash attaches cdb.exe to the process in invasive mode and leaves it attached until either you close the debugger (generating a ctrl-c event), or until the process crashes or gets an interrupt (breakpoint). Whilst attached, it creates mini dumps for all access violation exceptions, and logs all other exceptions that it has set up in a log file… and if the process crashes it generates a full dump when it exits.
Both types of dumps are created in a directory under your debuggers directory marked with the date and timestamp of the debug session.
-pn specifies what process you want to attach to, by process name.
-p specifies what process you want to attach to, by process ID.
-c allows you to pass a configuration file to adplus so that you can configure your own breakpoints, exceptions etc. An example of when you might want to use a configuration file can be seen in my post about debugging exceptions.
For more information on adplus usage, look up adplus in the windbg.exe help file.
Knowing how to take them is 1/10th of the battle, the hard part is knowing when. That is even harder than doing the actual debugging. Looking at a dump taken at the right time in the right way is almost a breeze if you are a somewhat experienced debugger.
This is the easy one. You attach the debugger in crash mode, leave it running until the process dies and you’re done. Hmm… in most cases… The caveats here is that the process might only crash every 2 weeks, and meanwhile it might be throwing a lot of NullReferenceExceptions, generating a lot of access violation mini dumps which makes it not feasible to run. In that case you have to configure adplus to not dump on 1st chance access violations.
The process might also be recycled once in a while for maintenance or other reasons, so you may get a false positive, so make sure to match it up with the event log to make sure that you are actually catching what you think you’re catching.
And 3rd, sometimes you will only see one thread in the crash dump, so it wont give you much. In that case it might be good to create a config file that allows you to break on Kernel32!ExitProcess so that you catch the threads as the process is trying to shut down.
The hardest crashes to debug are ones where nothing in the process caused it to die, but rather an external application, or condition (no available memory for example) caused it to die.
In each case, when you get a crash dump, look at the faulting thread (the thread you are on when you open the dump), and look in the log file for what exceptions occurred right before the crash, and finally match up with any events that happened recently in the event log.
Hangs and performance issues
A real hang is usually pretty easy to catch, the web admin can grab a hang dump right before restarting the application pool.
If your server starts slowing down and you don’t know if it is really a hang, or just very slow performance, then take two hang dumps after each other, with perhaps 2 minutes in between. (Make sure that the first dump has completely finished writing before you take the 2nd one, otherwise you will get two identical dumps). That way you can compare the dumps and see if anything is moving at all.
The trickier ones are the ones where one out of 5000 requests take 3 seconds instead of 1 second. They are so tricky that I really don’t have a good answer to what to do. Mainly in these cases, if it is a web server, we try to look at the IIS logs and see if there is a specific request that seems to take longer (time-taken) at times, and try to correlate it to other things happening around the time, along with data about the user, location of the browser on the network etc. In short, this is one place where debugging may not be the best way. Once we know a bit more about the conditions, stress testing to make it happen more readily is a good strategy here.
There are three different situations here
- you want to know why you are using so much memory
- you want to know why your memory is constantly rising
- you want to know what caused an OutOfMemoryException
In the first case, a hang dump when the memory is high is a good start.
In the second case, a number of dumps spaced 100-200 MB apart is recommended, so you can compare the memory usage.
And finally, for the OutOfMemoryException, you can either look at a hang dump when the OOM has occurred and see what is using the memory, or you can run adplus in crash mode and set the registry keys for GCFailFastOnOOM to 2 so that the process recycles when you hit an OOM. See this KB article for details.
I’m working on a post for debugging memory issues but it is taking some time, so stay tuned…
What is sos.dll
You can’t really talk about debugging managed processes without mentioning sos.dll. You can load extensions in windbg and cdb that automate tasks that you would otherwise have to do manually. Some things that it helps you automate are very hard to do manually, like building the managed stack from the native stack. That is where sos.dll comes in.
You can find sos.dll in the clr10 directory, under your debuggers directory, for 1.1 applications, or in the framework directory, and you load it using
.load clr10\sos or .load c:\blablamypath\sos.dll
Then you can start running commands using !commandname, like
!clrstack for example to get the managed stack of a thread.
The full list of commands you can run can be found by running
!sos.help, and throughout my posts I am using a lot of them.
A long long time ago, and many versions of sos.dll ago, me and a colleague wrote some very basic help files for sos.dll. If you have the SDK for .net 1.0 or 1.1 installed you can find them in a directory similar to
C:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\Tool Developers Guide\Samples\sos\SOS.htm but beware, they are a bit raw to say the least:) and many commands have been added to sos.dll since then, so they are by no means complete help files.
It’s a bit of a nostalgic trip, really, to look at them and realize how little we (as a collective) knew about managed debugging back then:) and how hard it used to be in the days before sos.dll
Snap, snap… enough writing, back to debugging some nasty deadlocks :)