In a server application even the smallest leak can grow fast to become a major issue.

In a pure unmanaged world, finding these leaks can be challanging but there are more than a few ways of doing so (perhaps I shall discuss this in a later post), but in .NET, with the help of our faithful friends WinDbg and the SOS extension, we can easily find and eliminate these leaks in no time if we stick to a proven and useful methodology.

As you already know, in pure managed code there are no memory leaks in the traditional manner, you can’t forget to free something since we are in a garbage collected environment. What you can do, however, is to forget to unreference an object in which case the GC will never consider it garbage and will never collect it.

Common situations include having some static variable (Shared variable for our VB.NET audiance) that is a collection of some type (usually a Hashtable) that you add various objects to it.
Since its a static variable it is rooted and will never get unreferenced and if we don’t explicitally remember to remove the objects we have added to that collection we have a managed memory leak.

So how can we overcome this problem?

Well… there are three ways.

  1. You can go the money way which involves buying one of the various .NET memory profilers available today (I can personally recommed on the fine and very easy to use .NET Memory Profiler by SciTech. Its an excellent tool and you get a 14 days fully functional evaluation and the new version also have a command line and an instrumentation API that lets you intiate the profiler in certain places in the code without a user intervention).
  2. You can go the programming way by implementing your own memory profiler using the .NET profiling API which involves implementing a COM interface and handling all the plumbing yourself (yet another post material for future posts 😉 and its even very interesting to do).
  3. Using WinDbg and the SOS extension in a certain methodology that will guarentee success.

Obviously, we are going to talk about option #3 in this post.

The tools and commands we are going to use are:

  • adplus.vbs – a sophisticated VBScript that was written by some fine SIE engineers at Microsoft Support that instruments the use of cdb and the process of taking memory dumps automatically. It comes with the Debugging Tools for Windows installation (the WinDbg installation).
  • WinDbg (duh!)
  • !dumpheap – An SOS extension command used to list the objects allocated on the managed heap.
  • !gcroot – An SOS extension command to find who is the object that is referncing the object we are currently checking.
  • !dumpobject – An SOS extension command to investigate a certain object and see what are its fields and who they reference.

Before we can start to work on WinDbg we need a set of dumps taken in a predefined and continous interval. If the memory leak is small we will use a bigger interval (i.e. 20min) if the leak is big we will use small interval (i.e. 1min).

To take the dump we will use the following adplus.vbs command line:

([UPDATE] 6/16/2005: Thanks to some anonymous reader that pointed my mistake. I WAS reffering for the use of -hang and NOT -crash option, which is good for other situations)

adplus.vbs -hang -p [process Id] -quiet

  • -hang tells adplus.vbs to attach the debugged process, take a memory dump and detach (this will only work in Windows 2000 and above, since the detach option is only availabe in Windows 2000 and above).
  • Replace [process Id] with the PID of the process we want to take its dump
  • -quiet will silence unnecessay message boxes that adplus might pop in certain cases.

After we have a series of dumps (usually 3 will suffice to show a trend) we can get down and dirty with WinDbg.

TIP: To save your WinDbg session that will contain all the commands you have executed and all the outputs you have received use the .logopen [file name] command after openning the memory dump and before starting to run any other command

Since our problem is memory increasing and its a pure managed application it means we are allocating objects and some object that is still alive is holding a reference to them making the GC not consider them garbage and therefore not collecting them.

To find them we will use the following set of commands:

  1. Run !dumpheap -stat to get a statistical view of all objects currently allocated in the heap. If the case is indeed objects that are allocated and never gets free you will find that certain classes will have a steady increasing amount of live instances over the course of the series of dump taken and this should point to the objects that we should take a closer look at. Click here to see a Sample Output.
    In our case, we see that we have 4500 instances of type MemoryLeakSample.MyObject which should turn a red light in our head since this is a good candidate of a leaking object.
  2. Run !dumpheap -type MemoryLeakSample.MyObject where “MemoryLeakSample.MyObject” is the full namespace and class name of the class type that we in the step above. This should give us a list of all the instances sorted by their addresses. Since we are in a GC environment and the .NET GC compacts the heap, the instance with the smallest address will be our oldest instance and we should focus on it. Click here to see a Sample Output.
  3. Run !gcroot 0x04ab8348 where 0x04ab8348 is the address of the oldest object (the first one in the list) that we saw earlier and we will get a full list of references showing who is holding who. In our case, since we have a static hashtable we will see something like “HANDLE(Strong):931c0:Root ….” which means that the parent (and true parent that is at the top of the reference list) is rooted, meaning it will never get freed unless the AppDomain will unload (in a multiple AppDomain scenario) or the process will die. Click here to see a Sample Output.

Now all we have left to do, is to find out where in the code we have defined this static variable and either NOT use a static variable (the best option, but not always possible) or make sure we clean this collection.

This methodology can be used to find not only rooted objects, but a big increase in instances of a ceratin type over a period of time and see who is holding them.

The best thing about this methodology is that you can do it at a customer’s site without a problem and finding 80% of the problem without having your symbols and/or your code.

Happy leak hunt!