Archive for the 'Performance' Category

If you are using the AJAX Control Toolkit you’ve probably noticed the ToolkitScriptManager server control.

One of the great features of the ToolkitScriptManager that comes with the AJAX Control Toolkit is its ability to combine all client side scripts that needs to be loaded in the browser into one request, thus saving the browser’s need to issue multiple requests to the server and speeding up the process of loading the page.

The URL of that combined request contains a hash code of each script that should be combined, so if you have different combinations of client side scripts due to different combinations of controls you are using from the AJAX Control Toolkit you will have a unique URL for each such request.

“So what’s in it for me?”, you ask.

It’s simple. Since these scripts are only changed when you change the AJAX Control Toolkit version (and that only happens once in a while) and these scripts can be quite large (even when Gziped), they are perfect candidates for being delivered from a Content Delivery Network (CDN).

In short, CDN can make your site faster by delivering static content (i.e. images,scripts,static html files, etc) from a location closer to the user making the request. It does that by performing the request on behalf of the user requesting that resource, caching it for a certain period of time and distributing it to server that are geographically closer to the user.

In most CDN systems you need to change the hostname of the resource you wish to be fetched from a CDN to something that was preconfigured to work with your site. The problem with the AJAX Control Toolkit is that it is the one that renders the link to the combined javascripts and you have no control over it.

Luckily the designers of the ToolkitScriptManager class thought about a hook that allows you to change the handler URL of the combined scripts.

If you’ll set the “CombineScriptsHandlerUrl” property with a URL that refers to a CDN you’ll make these scripts get downloaded through the CDN, thus making your site load faster.

For example, instead of having the combined scripts URL look like this:


It will look something like this:…

This will actually make a request to the CDN instead of going directly to your site since “” is a domain mapped to the CDN (of course not all CDN networks work exactly like that, but most of them work in this manner where you map a certain subdomain or another domain to the CDN).

It’s a rather quick and easy way to boost your site’s load speed with very little effort. Keep that in mind when you need to optimize the client load side of things.

Typed DataSets are a type safe wrapper around a DataSet which mirrors your database structure. It was created to make sure that the code accessing the database is type safe and any changes in the database structure that changes tables, columns or column types will be caught at compilation time rather than runtime.

If you have a big typed dataset that contains a lot of tables, columns and relations it might be quite expensive to create it in terms of memory and time.

The main reason that creating a big typed dataset is expensive is due to the fact that all of the meta data that is contained within the typed data sets (tables, columns, relations) is created when you create a typed dataset even if eventually all you’ll use it for is to retrieve data from a single table.

I can speculate that the reason all of the typed dataset meta data is created during the instantiation of the typed dataset is due to the fact that it inherits from a generic DataSet and accessing the meta data (tables, columns) can also be done in a non type safe manner (i.e. access the Tables collection and/or Columns collection of a table).

If you are using a typed dataset (or dataset in general) you might be interested in the following tips:

  • If you have a big typed dataset, avoid creating it too many times in the application. This is specifically painful for web applications where each request might create the dataset. You can use a generic DataSet instead, but this might lead to bugs due to database changes and the fact that you’ll only be able to find these bugs during runtime rather than compilation time (which basically misses the whole point of using a typed dataset in the first place).
  • DataSets (typed datasets included) inherits from the MarshalByValueComponent class. That class implements IDisposable which means DataSets will actually be garbage collected after finalization (you can read more about finalization and the finalizer thread here). To make sure datasets are collected more often and are not hagging around waiting for finalization make sure you call the “Dispose” method of the dataset (or typed dataset) or use the “Using” clause which will call “Dispose” for you at the end of the code block.
  • Don’t use DataSets at all 🙂 Consider using a different data access layer with a different approach such as the one used in the SubSonic project.

I guess it would be rather trivial creating a typed dataset that is lazy in nature which creates the meta data objects only when they are accessed for the first time. That would reduce the memory footprint of a large typed dataset but will make the computation used to create these objects a little less predictable. If you are up to it or already did a lazy typed dataset ping me through the contact form 🙂

Maoni (which has a great blog that I’d recommend everyone to read) posted a set of posts (first, second and third – 3 so far…) about what the difference between performance data reported by different tools and what does it actually mean.

When you use a tool that collects performance data (or any data for that matter) you should always be aware of how this data is being collected and what does it say in the context of that tool.

Manoi does a fine job and describing numerous tools such as perfmon and WinDbg (with the use of the SOS extension).

See my previous post “Perfmon – Your debugging buddy” for some information about how various .NET counters can help you debug your application.

Disclaimer: I usually like to keep this blog clean of link posts (posts that only have links to other posts in other blogs), but this time the information was too valuable so I had to make an exception.
Although this information is not exactly pure .NET debugging, it does have some information that can actually save you debugging time 🙂

Tess has some interesting tips that she expanded from a post her colleague Doug wrote about:

Read, learn and implement where needed.

There is a good article online from MSDN magazine February 2006 issue named “Improving Application Startup Time“.

It talks about how to improve application startup time and has some very good tips, some of which we indirectly talked about here.

Below is a quick summary of issues in the article with some quick comments of my own:

  • Load fewer modules at startup
  • Avoid unnecessary initialization
  • Place Strong-Name assemblies in the GAC
  • Use NGEN (don’t forget to test if it actually helps you because its benefit may only be little and therefore irrelevant)
  • Avoid rebasing – Set your DLLs base address so the loader won’t have to find a different base address for it.
  • Application Configuration – XML configuration files are nice and all but they do come at a cost.
  • The Impact of AppDomains
    • Try to load assemblies as AppDomain neutral
    • Enforce efficient Cross-AppDomain communication
    • Use NeutralResourcesLangaugeAttribute – it will make the lookup of a resource faster
  • Use Serialization wisely – using it (or deserialization) at startup can be costly, especially if you have this huge graph of objects to serialize/deserialize 😉

While I usually don’t speak too much about performance, I’ve decided to write about this subject since I’ve stumbled upon it one time too many.

This issue was at the bottom of my post ideas list, but after seeing this article in ZDNet, I’ve decide to pay a bit more attention to this issue.

What’s Hyper-Threading?
Hyper-threading is Intel’s implementation for simultaneous multithreading, a technology that enables the processor to utilize empty cycles that are wasted when the currently running thread is waiting for a long operation such as RAM access or disk access.

While on paper this technology should speed up certain opeartions, in certain workload and certain scenarios of intensive server applications performance actually decreases.

A Sample Scenario
Let’s imagine the following scenario (which is very common to a lot of server applications).
Let’s say we have a few worker threads that are handling requests from clients. If too many requests are coming they are being queued.

Since this queue is shared among all threads and is accessed frequently, it will also be in the L1 or L2 cache of the CPU.

Now consider another different thread that is also running the background. It belongs to the application but it is not a worker thread that handles requests. It’s a thread that is being awakened periodically or by a trigger.
This thread runs and scans large chunk of the memory / objects / cache / (put your memory intensive task here) for changes. While it is running, sometimes getting some of those free cycles while one or more of the other worker threads are waiting for that operation, it will trash the L1 and L2 cache on the CPU due to the various tasks that its performing.

In that case, when the worker threads returns to work they try to access the memory that was previously in the L1 and L2 cache, but since the other non-worker thread trashed the cache we will get a cache miss that will cause us to fetch things from the RAM, or even worse, from the page file.
These operations will take instead of 2-4 cycles some where between 10-100 or even more cycles.

What actually happens is that in the L2 and L3 caches in the processor are being trashed whenever they switch to a different thread while the current one is waiting for some I/O operation.
This is specifically bad for applications such as ASP.NET and SQL Server.

ASP.NET and SQL Server
ASP.NET and SQL Server are two common server applications that use .NET (SQL Server 2005 hosts the CLR for stored procedures).

These two server applications are heavy on thread usage and since they both use the CLR (probably with the server GC) it means that they will have a GC thread per heap (and we have heap per Logical CPU) that the GC is being performed on them.
GC threads are memory intensive since they scan through all the memory of the generations being collected and traverse the various pointer to determine which objects are garbage or not.

In addition to the GC threads, SQL server has additional system threads running in the background that can also lead, in certain situations, to a decrease in performance.

While I’m not aware of system threads in ASP.NET, I know that some applications have additional threads running inside an ASP.NET which may cause them, in certain situations (of course), to suffer a decrease in performance.

You can read more about the effect Hyper-Threading has on SQL Server in this blog post by an MS developer.

So what should you do?
As I’ve said numerous times during this post, this behavior only happens in certain situations.
This means, that the best way of handling these issues is to stress test your application under high load with both Hyper-Threading enable and disabled.

Only then you will be able to determine if under the tested load Hyper-Threading is working with you or against you.

Server GC pre-allocation in Hyper-Threaded enabled environments
There is another benefit for disabling Hyper-Threading regarding the whole cache misses issues that we talked about above.

The Server version of the GC allocates a separate heap and a GC thread per Logical CPU. Hyper-Threading causes Windows to see a single physical Hyper-Threaded enabled CPU and 2 Logical CPUs (that’s one of the tricks that it uses to reschedule other threads for executions while others are waiting for their costly operations).
This means that, if you have a 2-CPU machine with Hyper-Threading enabled, upon starting your ASP.NET application (or any application that uses the Server GC) the GC will pre-reserve 64Mb x 4 CPUs = 256Mb of your virtual address space (which is 2Gb per process or 3Gb if you set the /3Gb flag in boot.ini).

If you have a memory intensive application that might be more than you need. By disabling Hyper-Threading you will be able to reduce that to 128Mb.

This is another factor one should consider when enabling or disabling Hyper-Threading.

The funny thing about Hyper-Threading and .NET is that I’ve found this post in MSDN saying that it can boost .NET and everything will be great and fine. The problem is that they neglected to mention some of the issues we’ve talked about here.

In my previous post about GC.AddMemoryPressure Tomer Gabel commented that people should not use this feature since its not the correct thing to do.

In addition to that Yuval Ararat wrote a post about this issue as well.

First of all, if I wasn’t clear enough in my first post, they are technically write BUT (and there is a big but here) there are situations, mainly in interoperability issues that might require us to better notify the GC that behind this small and insignificant .NET object lies a whole lot of unmanaged memory that is allocated and will only be released when this .NET object dies.

Calling GC.AddMemoryPressure will simply make the GC consider this object for who it really is, a memory hungry object that hides behind it a lot of unmanaged allocated memory.

Originally I just wanted to introduce this feature to people and make sure they know about it even though its quite esoteric and will mostly be used by applications with specific needs (and believe me, I know at least one application that could have benefited from this feature if it was available in .NET 1.1).

And Tomer, regarding the typos and stuff, I usually try to spell check and proof my posts, but some times they slip by 😉

This post is a bit futuristic, but since .NET 2.0 and Visual Studio 2005 release is very near, I thought I should start to talk about it a bit more.

.NET 2.0 will introduce an improved and better GC.
One of the parameters that the GC takes into account is the amount of memory a certain class instance takes. By doing so, the GC can better understand how much memory will be gained by collecting this object.

One of the inputs the GC takes into accoutn when decided whether to initiate a collection or not is the amoutn of managed memory allocated.
If we have a managed class instance that doesn’t allocate a lot of mamanged memory but holds a pointer to a large unmanaged memory (either a reference to a COM object that allocates a lot of information, or directly allocating unmanaged memory using functions such as Marshal.AllocHGlobal) the GC will not know about the unmanaged memory allocated and will not consider scheduling a GC sooner.

This means, that if there is no reference to that object and its finalizer releases the unmanaged memory, until there is a GC and the finalizer thread reaches that object, this unmanaged memory will not be released and may add pressure on the application’s memory usage.

For this purpose, in .NET 2.0 the “AddMemoryPressure” function was added to the GC class.

“AddMemoryPressure” allows the developer to notify the GC about the amount of additional unmanaged memory that was allocated in different places in the application. The GC will take this into account when considering the schedualing of a collection.

For example, if at some point in the application I create a COM object that I know allocates a bit chunk of memory, after creating it I will call “AddMemoryPressure” and give it a rough estimate of the amount of unmanaged memory this COM object takes.

For example:

class MySpecialBitmapClass
private long size;

MySpecialBitmapClass(string fileName)
size = new FileInfo(fileName).Length;


When I create the class I say that I will add a memory pressure which is at least as large as the file I’m working on. When the instance of this class is being finalized it will remove the pressure.

Be sure to tune in to hear some more changes and updates that are coming in .NET 2.0’s GC.

.NET allows creating Windows Services which are commonly used for unattended services such as Remoting containers.

For some reason, when using a Windows Services as a container for your Remoting application or as just a Windows Service that perfrom various tasks, the GC used is the workstation GC.

We have previously talked a bit about the difference between the workstation GC and the server GC but I’ll explain a bit about them again.

Workstation GC
The workstation GC is, as its name applies, is used in a workstation scenario. It is optimized for single CPU machines and for desktop application by using the main thread to perform the GC.
It uses 16mb segments that it reserves and sub allocates.

It has an option called “Concurrent GC” which allows the GC to run on a dedicated thread.

Server GC
The Server GC It is optimized for server operations and works only on a multi processor machines (CPUs that has Hyper Threading enabled are considered as two CPUs).

It has a GC heap per CPU and a thread per CPU that performs the garbage collection.
It uses 32Mb segments that it reserves and sub allocates.

All of these features make the Server GC more appropriate for demanding server applications due to its higher throughput.

Who uses the Server GC?
The only containers that use the Server GC by default are ASP.NET and COM+ (through Enterprise Services).

All other applications including Windows Services use the Workstation GC by default.

This means that even if you wrote a cool Windows Service in .NET that does cool stuff it may suffer from using a non optimized GC even though its a high throughput service that serves millions of users.

So, what can we do about it?
Before .NET Framework 1.1 SP1 you had to implement your own container for the CLR.
Since .NET Framework 1.1 SP1 you can just add to your app.config file the following tag and it will tell the GC to use the Server GC (of course, only if you have more than one CPU):

<gcserver enabled="true" />

You can read more about it (though not too much) in this Microsoft KB article.

For .NET Framework 1.0 you’ll still have to implement your own container for the CLR.
There are a bunch of these hangging around. A nice one is this one which is posted in The Code Project.

Just think about the fact that with one simple flag you can boost your Windows Service performance.

Rico Mariani, The man (with a capital “T”) for CLR performance and other related information, has posted this on his blog.

This post features a wealth of links to information and programs such as VADump (a fine program to list your memory usage in a given proccess) and some information about how to use it).

Links to the CLR Profiler (which I’m hoping to cover in one of the next posts) for both 1.1 and for Beta2.
And a small LogDump analyzer that he wrote.

I would recommend in generate to check his blog. He has some nice information there that can help anyone.