16 minute read

This issue has been pretty frequent since .net came out, and I am bringing it up for two reasons.

  • It is something I think every web developer calling web services should be aware of… and
  • It is a good example of how the product changes based on feedback from support incidents etc. to better suit our customers

Problem Description

We have an ASP.NET application acting as a GUI to our middle tier (Web Service) and at peak times the application is dead slow. Requests start timing out but some do go through so it doesn’t appear that we are completely deadlocked.

After the peak in volume the application seems to work fine again.

During this time CPU utilization is reasonably low in the GUI process, and it is very low in the Web Service process. So low in fact that it looks like hardly anything is running in the web service application.

Gathering information

In a multi-tier application of any kind it is always hard to determine where to grab memory dumps, and often times the tiers are on different machines so it’s not as easy as just setting the debugger up to grab a dump from all processes at once.

Let’s call the GUI server “Server A”, and the backend server (web services server) “Server B”.

If we take a hang dump of Server A we temporarily stall Server A (even if just for a few seconds) and thus requests to Server B will be drained so we can’t go in and take a dump of Server B and expect all requests to still be alive. If on the other hand we take a hang dump of Server B, stalling it momentarily, Server A’s request might time out because we are debugging Server B. So we have to focus on one of them at a time.

There are a number of possible reasons we could be “hanging” here.

  • The web service is slow executing the requests, or possibly even deadlocked.
  • The GUI app is slow in parts not related to the web service
  • The GUI app is slow getting requests through to the web service
  • other

If we focus on Server A (GUI) we can at least see if we are waiting for a response from Server B, in which case, next time we can take dumps from Server B, and find out what we are waiting for. So a hang dump of the asp.net process on Server A, or preferably two, taken about a minute or so apart so we can see if the threads are moving or completely stalled, would be good. Note: if you are taking multiple dumps, make sure that the first dump is ready before you take the next one so you don’t get two identical dumps.

Debugging the issue

Drum roll please!… Introducing a command I haven’t shown before !threadpool

0:065> !threadpool
CPU utilization 7%
Worker Thread: Total: 37 Running: 37 Idle: 1 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 0
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

CPU Utilization is 7%. This is the complete CPU utilization on the box, so not specifically for the Asp.Net process, either way it is pretty low, just like it was mentioned in the problem description.

Currently there are no work requests in the queue and no timers waiting.

We have a maximum of 40 worker threads to execute our requests, and a minimum of 2 alive at any time (this comes from our machine.config settings). And currently we are executing on 37 of these, but it doesn’t appear that the thread pool is exhausted. This jives well with that there are no work requests in the queue.

So what are all these threads doing?

Running ~* kb it appears that most of the threads are in some kind of wait. More specifically in a CorWaitOneNative

  61  Id: 1164.a44 Suspend: 1 Teb: 7ff3a000 Unfrozen
ChildEBP RetAddr  Args to Child
0d58f430 7c822114 77e6711b 00000001 0d58f480 ntdll!KiFastSystemCallRet
0d58f434 77e6711b 00000001 0d58f480 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d58f4dc 791e0b3b 00000001 0d58f5bc 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d58f50c 791e0bdd 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d58f560 791d5ba9 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d58f5a8 024840c8 0d58f5b4 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d58f614 0c9a5cb7 00000000 06c60b10 06c60b10 0x24840c8
0d58f644 0c9a00c2 06c60b10 0677c0c0 00000000 0xc9a5cb7
0d58f688 0c870721 06c60f9c 06c5fe04 06c60b10 0xc9a00c2
0d58f6ac 0c870532 06c60af0 06c5fe04 067592f0 0xc870721
0d58f6d0 0c952ea4 067592f0 06c5fe04 06c5f13c 0xc870532
0d58f800 799a67a2 06c5f1a4 06c5f4a8 799b4168 0xc952ea4
0d58f80c 799b4168 00000002 00000000 024a1710 mscorlib_79990000+0x167a2
0d58f890 791b3208 0d58f9a8 791b3ad7 0d58f8e4 mscorlib_79990000+0x24168
0d58f898 791b3ad7 0d58f8e4 00000000 0d58f8bc mscorsvr!CallDescrWorker+0x30
0d58f9a8 791d3ef0 00bf71bb 79b7c000 00000004 mscorsvr!MethodDesc::CallDescr+0x1b8
0d58fa64 791d3fa4 79bf71bb 79b7c000 79bd46de mscorsvr!MethodDesc::CallDescr+0x4f
0d58fa8c 791ebc5a 0d58faa4 00101ae0 0254c470 mscorsvr!MethodDesc::Call+0x97
0d58fab4 791cc3c8 02554bb0 791b3d64 ffffffff mscorsvr!RegisterWaitForSingleObjectCallback_Worker+0x139
0d58fafc 791ebb31 00101ae0 791ebbcc 02554bb0 mscorsvr!Thread::DoADCallBack+0x5c
0d58fb70 791d4096 02554bb0 02537ae8 793ecec8 mscorsvr!QueueUserWorkItemCallback+0xe3
0d58fb84 791d4055 02537ae8 00000000 791d3fe5 mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19
0d58fba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129
0d58ffb8 77e66063 0254c108 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
0d58ffec 00000000 791fda8b 0254c108 00000000 kernel32!BaseThreadStart+0x34

…this means that someone from managed code used WaitHandle for synchronization, and calls WaitOne().

Checking the managed bit of this thread we can see that we are in WebForm1.Page_Load, calling into a function SlowStuff.seventoo.Service.DoSomethingSlow(), which in turn invokes a web service call, and now sits waiting in a function called GetRequestStream().

0:061> !clrstack
Thread 61
ESP         EIP
0x0d58f5d8  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d58f5ec  0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d58f620  0x0c9a5cb7 [DEFAULT] [hasThis] Class System.IO.Stream System.Net.HttpWebRequest.GetRequestStream()
0x0d58f64c  0x0c9a00c2 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d58f694  0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d58f6b4  0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d58f6dc  0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d58f6ec  0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d58f700  0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d58f744  0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d58f780  0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d58f788  0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d58f798  0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d58f7e0  0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d58f828  0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d58f844  0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d58f880  0x0cb43cf4 [DEFAULT] [hasThis] Void System.Web.RequestQueue.WorkItemCallback(Object)
0x0d58fac0  0x791b3208 [FRAME: ContextTransitionFrame]

If we run ~* e !clrstack to get the managed stack for all the threads we find that 33 requests are waiting like this in GetRequestStream and 2 requests are waiting in GetResponse.

0:042> kb 2000
ChildEBP RetAddr  Args to Child
0d0cebc4 7c822114 77e6711b 00000001 0d0cec14 ntdll!KiFastSystemCallRet
0d0cebc8 77e6711b 00000001 0d0cec14 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d0cec70 791e0b3b 00000001 0d0ced50 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d0ceca0 791e0bdd 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d0cecf4 791d5ba9 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d0ced3c 024840c8 0d0ced48 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d0ceda8 0ca29fd5 00000000 02970580 02970580 0x24840c8
0d0ceddc 0ca29e49 02970580 02970580 0677c0c0 0xca29fd5
0d0cee0c 0ca29e06 0c9a0110 02970580 0677c0c0 0xca29e49
0d0cee54 0c870721 02970a0c 0296f868 02970580 0xca29e06
0d0cee78 0c870532 02970560 0296f868 067592f0 0xc870721
0d0cee9c 0c952ea4 067592f0 0296f868 0296c800 0xc870532
0d0cefcc 799a67a2 0296c868 0296cb6c 799b4168 0xc952ea4
0d0cefd8 799b4168 00000002 00000000 024a14c4 mscorlib_79990000+0x167a2
0d0cf060 79217188 00000000 00dc9160 0672a508 mscorlib_79990000+0x24168
02442f60 24548bca 2c15ff10 8b020b95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02442f60 00000000 2c15ff10 8b020b95 8bcb8bd8 0x24548bca

Thread 42
ESP         EIP
0x0d0ced6c  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d0ced80  0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d0cedb4  0x0ca29fd5 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Net.HttpWebRequest.GetResponse()
0x0d0cede4  0x0ca29e49 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee14  0x0ca29e06 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee18  0x0c9a0110 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d0cee60  0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d0cee80  0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d0ceea8  0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d0ceeb8  0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d0ceecc  0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d0cef10  0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d0cef4c  0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d0cef54  0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d0cef64  0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d0cefac  0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d0ceff4  0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d0cf010  0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d0cf04c  0x02446448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0d0cf058  0x02442fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0d0cf120  0x79217188 [FRAME: ContextTransitionFrame]
0x0d0cf200  0x79217188 [FRAME: ComMethodFrame]

Hmm… so what are these waiting for?

Well… when we are waiting in GetResponse we have sent the request to the web service server and we are waiting for it to return, in which case we can either be blocked/waiting on the server or we could be waiting on a callback thread to process the results on. In our case we have not exhausted the thread pool so we would have had free threads to use as callback threads. So… we are likely waiting on results back from the server on the two threads that are in GetResponse().

Although these two threads are interesting I find it even more interesting that we have 33 threads waiting in GetRequestStream(). This function is just supposed to get us a stream that we can send the request on. OK! So that seems like something that would be rather quick, however in doing so it also needs to get a connection.

So… what would be limiting us from getting a connection?

System.Net.Connections are managed by a class called ServicePoint, that contains information about the URI it services, the maximum number of connections, whether it expects 100-continue etc.

From !dso (!dumpstackobjects) on thread 61 (which is waiting on GetRequestStream() we can get the HttpWebRequest to the Web Service.

0:061> !dso
Thread 61
ESP/REG    Object     Name
...
0xd58f5f0 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f5f4 0x6c8f23c System.Net.LazyAsyncResult
0xd58f5f8 0x6c95868 System.Threading.ManualResetEvent
0xd58f610 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f620 0x6c60b10 SlowStuff.seventoo.Service
0xd58f624 0x6c60b10 SlowStuff.seventoo.Service
0xd58f628 0x677c0c0 System.String    DoSomethingSlow
0xd58f630 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f634 0x6c8c2e0 System.String    http://tempuri.org/DoSomethingSlow
0xd58f638 0x677335c System.String    http://tempuri.org/DoSomethingSlow
0xd58f64c 0x6c60b10 SlowStuff.seventoo.Service
...

And if we dump this out we find that it has a member variable called _ServicePoint, so that looks pretty interesting…

0:061> !do 0x6c8b0d0
Name: System.Net.HttpWebRequest
MethodTable 0x0c978b94
EEClass 0x0c96e798
Size 204(0xcc) bytes
GC Generation: 0
mdToken: 0x020001fa  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c977cf4
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x0c9774d0 0x4000f0c        0                CLASS     shared   static s_PrefixList
    >> Domain:Value 0x000d5330:NotInit  0x001024b8:0x0274a580 <<
0x0c978b94 0x4000f30     0x10                CLASS   instance 0x06c8f23c _WriteAResult
0x0c978b94 0x4000f31     0x14                CLASS   instance 0x00000000 _ReadAResult
0x0c978b94 0x4000f32     0x18                CLASS   instance 0x00000000 _ContinueDelegate
0x0c978b94 0x4000f33     0x1c                CLASS   instance 0x0274e5c4 _ServicePoint
0x0c978b94 0x4000f34     0x20                CLASS   instance 0x06c91c58 _WriteEvent
0x0c978b94 0x4000f35     0x24                CLASS   instance 0x00000000 _WriteInProgressEvent
0x0c978b94 0x4000f36     0xb4       System.Boolean   instance 0 _HaveResponse
0x0c978b94 0x4000f37     0xb5       System.Boolean   instance 0 _OnceFailed
0x0c978b94 0x4000f38     0xb6       System.Boolean   instance 0 _WriteNotifed
0x0c978b94 0x4000f39     0x28                CLASS   instance 0x00000000 _HttpResponse
0x0c978b94 0x4000f3a     0x2c                CLASS   instance 0x00000000 _ResponseException
0x0c978b94 0x4000f3b     0x30                CLASS   instance 0x00000000 _CoreResponse
0x0c978b94 0x4000f3e     0x34                CLASS   instance 0x027447d0 _Verb
0x0c978b94 0x4000f3f     0x38                CLASS   instance 0x027447d0 _OriginVerb
0x0c978b94 0x4000f40     0x3c                CLASS   instance 0x06c8b6b4 _HttpRequestHeaders
0x0c978b94 0x4000f41     0x40                CLASS   instance 0x00000000 _WriteBuffer
0x0c978b94 0x4000f42     0x94         System.Int32   instance 2 _HttpWriteMode
0x0c978b94 0x4000f43     0xb7       System.Boolean   instance 0 _AllowAutoRedirect
0x0c978b94 0x4000f44     0x44                CLASS   instance 0x06c60b60 _Uri
0x0c978b94 0x4000f45     0x48                CLASS   instance 0x06c60b60 _OriginUri
0x0c978b94 0x4000f46     0x4c                CLASS   instance 0x00000000 _ChallengedUri
0x0c978b94 0x4000f47     0x50                CLASS   instance 0x00000000 _ChallengedSpn
0x0c978b94 0x4000f48     0x54                CLASS   instance 0x00000000 _MediaType
0x0c978b94 0x4000f49      0x8         System.Int64   instance -1 _ContentLength
0x0c978b94 0x4000f4a     0x58                CLASS   instance 0x0274c80c _Version

Btw, we can verify that we have picked the right one by dumping out the Uri

0:061> !do 0x06c60b60
Name: System.Uri
MethodTable 0x020b4eb0
EEClass 0x02472e10
Size 76(0x4c) bytes
GC Generation: 0
mdToken: 0x020001c4  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x020b4804
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x020b4eb0 0x4000e38      0x8                CLASS   instance 0x06c60e44 m_AbsoluteUri
0x020b4eb0 0x4000e39     0x3c       System.Boolean   instance 0 m_AlreadyEscaped
0x020b4eb0 0x4000e3a      0xc                CLASS   instance 0x066f0230 m_Fragment
0x020b4eb0 0x4000e3b     0x3d       System.Boolean   instance 0 m_FragmentEscaped
0x020b4eb0 0x4000e3c     0x2c         System.Int32   instance -41085911 m_Hash
...
0:061> !do 0x06c60e44
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 108(0x6c) bytes
GC Generation: 0
mdToken: 0x0200000f  (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
String: http://seventoo/myslowwebservice/service.asmx
FieldDesc*: 0x79b949e8
        MT      Field     Offset                 Type       Attr      Value Name
0x79b94638 0x4000013      0x4         System.Int32   instance 46 m_arrayLength
0x79b94638 0x4000014      0x8         System.Int32   instance 45 m_stringLength
0x79b94638 0x4000015      0xc          System.Char   instance 0x68 m_firstChar
0x79b94638 0x4000016        0                CLASS     shared   static Empty
    >> Domain:Value 0x000d5330:0x066f0230 0x001024b8:0x066f0230 <<
0x79b94638 0x4000017      0x4                CLASS     shared   static WhitespaceChars
    >> Domain:Value 0x000d5330:0x066f0244 0x001024b8:0x027122b8 <<
Back to the service point
0:061> !do 0x0274e5c4
Name: System.Net.ServicePoint
MethodTable 0x0c9c5d08
EEClass 0x0c9b6840
Size 92(0x5c) bytes
GC Generation: 1
mdToken: 0x02000247  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c5730
        MT      Field     Offset                 Type       Attr      Value Name
0x0c9c5d08 0x400112d     0x48       System.Boolean   instance 0 m_ProxyServicePoint
0x0c9c5d08 0x400112e     0x49       System.Boolean   instance 0 m_UserChangedLimit
0x0c9c5d08 0x400112f     0x2c         System.Int32   instance 2 m_ConnectionLimit
0x0c9c5d08 0x4001130     0x30         System.Int32   instance 2 m_ConnectionMode
0x0c9c5d08 0x4001131     0x4a       System.Boolean   instance 1 m_SupportsPipelining
0x0c9c5d08 0x4001132      0x4                CLASS   instance 0x0274e620 m_ConnectionGroupList
0x0c9c5d08 0x4001133      0x8                CLASS   instance 0x0677bf00 m_Address
0x0c9c5d08 0x4001134     0x34         System.Int32   instance 900000 m_MaxIdleTime
0x0c9c5d08 0x4001135     0x50            VALUETYPE   instance start at 0x0274e614 m_ExpiresAt
0x0c9c5d08 0x4001136      0xc                CLASS   instance 0x0677bf4c m_ConnectionName
0x0c9c5d08 0x4001137     0x10                CLASS   instance 0x0677e2e8 m_IPHostEntryInfos
0x0c9c5d08 0x4001138     0x38         System.Int32   instance 0 m_CurrentAddressInfoIndex
0x0c9c5d08 0x4001139     0x3c         System.Int32   instance 2 m_ConnectionsSinceDns
0x0c9c5d08 0x400113a     0x40         System.Int32   instance 2 m_CurrentConnections
0x0c9c5d08 0x400113b     0x14                CLASS   instance 0x00000000 m_ServerCertificate
0x0c9c5d08 0x400113c     0x18                CLASS   instance 0x00000000 m_ClientCertificate
0x0c9c5d08 0x400113d     0x4b       System.Boolean   instance 1 m_UseNagleAlgorithm
0x0c9c5d08 0x400113e     0x4c       System.Boolean   instance 1 m_Expect100Continue
0x0c9c5d08 0x400113f     0x1c                CLASS   instance 0x0274df08 m_LookupString
0x0c9c5d08 0x4001140     0x20                CLASS   instance 0x0677e248 m_CachedRemoteIPAddressInfo
0x0c9c5d08 0x4001141     0x24                CLASS   instance 0x027543e4 m_Version
0x0c9c5d08 0x4001142     0x4d       System.Boolean   instance 1 m_Understands100Continue
0x0c9c5d08 0x4001143     0x28                CLASS   instance 0x0274e72c DnsLock
0x0c9c5d08 0x4001144     0x44         System.Int32   instance 140 m_HashCode
0x0c9c5d08 0x4001145     0x4e       System.Boolean   instance 1 m_ComputedHashCode

Hmm, it has a connection limit of 2. That sorta rings a bell, we have two requests in GetResponse(), i.e. in the middle of some web service requests, and the rest waiting for a connection, so this makes total sense.

We could actually stop right here and conclude that this is the source of the problem. I.e. that we only have two available connections, but let’s talk about that a little bit later and dig a bit further instead…

If we go back to thread 42 (GetResponse()) and run !dso there, we find a ConnectStream on the stack

0:042> !dso
Thread 42
ESP/REG    Object     Name
...
0xd0cee2c 0x6cef7b8 System.Net.ConnectStream
...

This connect stream has a connection (one of the two created by the service point)

0:042> !do 0x6cef7b8
Name: System.Net.ConnectStream
MethodTable 0x0c9c44c0
EEClass 0x0c9b6270
Size 104(0x68) bytes
GC Generation: 0
mdToken: 0x02000272  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c3e5c
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x79b9c140 0x4000b17        0                CLASS     shared   static Null
    >> Domain:Value 0x000d5330:0x066f1eec 0x001024b8:0x0674a16c <<
0x0c9c44c0 0x400122f     0x3c         System.Int32   instance 1 m_CallNesting
0x0c9c44c0 0x4001230     0x20                CLASS   instance 0x06cf0c28 m_BufferedData
0x0c9c44c0 0x4001231     0x58       System.Boolean   instance 1 m_WriteBufferEnable
0x0c9c44c0 0x4001232     0x59       System.Boolean   instance 1 m_BufferOnly
0x0c9c44c0 0x4001233      0x8         System.Int64   instance 293 m_TotalBytesToWrite
0x0c9c44c0 0x4001234     0x10         System.Int64   instance 0 m_BytesLeftToWrite
0x0c9c44c0 0x4001235     0x24                CLASS   instance 0x0274f03c m_Connection
0x0c9c44c0 0x4001236     0x5a       System.Boolean   instance 1 m_WriteStream
0x0c9c44c0 0x4001237     0x28                CLASS   instance 0x00000000 m_ReadBuffer
0x0c9c44c0 0x4001238     0x40         System.Int32   instance 0 m_ReadOffset
0x0c9c44c0 0x4001239     0x44         System.Int32   instance 0 m_ReadBufferSize
0:061> !do 0x0274f03c
Name: System.Net.Connection
MethodTable 0x0c9c52f0
EEClass 0x0c9b6454
Size 152(0x98) bytes
GC Generation: 1
mdToken: 0x02000270  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c4bb8
        MT      Field     Offset                 Type       Attr      Value Name
0x0c9c52f0 0x40011f3     0x80       System.Boolean   instance 1 m_CanPipeline
0x0c9c52f0 0x40011f4     0x81       System.Boolean   instance 0 m_Pipelining
0x0c9c52f0 0x40011f5     0x82       System.Boolean   instance 1 m_KeepAlive
0x0c9c52f0 0x40011f6     0x58         System.Int32   instance 0 m_Error
0x0c9c52f0 0x40011f7      0x4                CLASS   instance 0x027be068 m_ReadBuffer
0x0c9c52f0 0x40011f8     0x5c         System.Int32   instance 25 m_BytesRead
0x0c9c52f0 0x40011f9     0x60         System.Int32   instance 25 m_HeadersBytesUnparsed
0x0c9c52f0 0x40011fa     0x64         System.Int32   instance 25 m_BytesScanned
0x0c9c52f0 0x40011fb     0x68         System.Int32   instance 0 m_TotalResponseHeadersLength
0x0c9c52f0 0x40011fc     0x6c         System.Int32   instance 65536 m_MaximumResponseHeadersLength
0x0c9c52f0 0x40011fd      0x8                CLASS   instance 0x06d0f0bc m_ResponseData
0x0c9c52f0 0x40011fe     0x70         System.Int32   instance 1 m_ReadState
0x0c9c52f0 0x40011ff      0xc                CLASS   instance 0x027bf074 m_WaitList
0x0c9c52f0 0x4001200     0x10                CLASS   instance 0x027bf0dc m_WriteList
0x0c9c52f0 0x4001201     0x74         System.Int32   instance 25 m_CurrentRequestIndex
0x0c9c52f0 0x4001202     0x14                CLASS   instance 0x027bf1c0 m_StatusLineInts
0x0c9c52f0 0x4001203     0x18                CLASS   instance 0x066f0230 m_StatusDescription
0x0c9c52f0 0x4001204     0x78         System.Int32   instance 0 m_StatusState

The connection in turn has a waitlist that contains all the requests that are waiting for this particular connection

0:061> !do 0x027bf074
Name: System.Collections.ArrayList
MethodTable 0x79ba2ee4
EEClass 0x79ba3020
Size 24(0x18) bytes
GC Generation: 0
mdToken: 0x02000100  (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
FieldDesc*: 0x79ba3084
        MT      Field     Offset                 Type       Attr      Value Name
0x79ba2ee4 0x4000362      0x4                CLASS   instance 0x02a7bd44 _items
0x79ba2ee4 0x4000363      0xc         System.Int32   instance 17 _size
0x79ba2ee4 0x4000364     0x10         System.Int32   instance 35 _version
0x79ba2ee4 0x4000365      0x8                CLASS   instance 0x00000000 _syncRoot

In this case, 17 requests are waiting for this connection, WOW!!!

And if we would run

0:061> .foreach (req {!do -v 0x02a7bd44 -short}) {!gcroot ${req}}
To find out on what threads all these requests are rooted/used we would see them rooted on thread 24, 26, 28, 30, 32, 34, 36, 37, 38, 44, 46, 48, 50, 52, 54, 56, and 58

The roots look something like this:

Scan Thread 46 (0x1540)
ESP:d1ceb9c:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebb8:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebbc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebc0:Root:0x68b7fb0(System.Net.LazyAsyncResult)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebdc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebec:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebf0:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebfc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cec18:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cec30:Root:0x29afcb0(System.Web.Services.Protocols.SoapClientMessage)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec34:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec38:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec50:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec64:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec68:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)

So all these threads are waiting for our thread 42 to finish. In other words, the last thread in the wait list will have to wait for (16*Web Service execution time) + time for currently executing request, before it can even start with its request. Now, that can be a pretty long time:)

When a new request comes in it gets scheduled in on one of the two connections and stands in line awaiting its turn, just like a busy day at the DMV.

A little background

Why 2 connections??? that seems like an awfully small number…

Let’s take a look at RFC 2616, the RFC for Hypertext Transfer Protocol – HTTP/1.1. This is what it says about HTTP connections.

“Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. A proxy SHOULD use up to 2N connections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times and avoid congestion.”*

So now you know where the number 2 likely came from. But as you can see, in our case it presents a bit of a problem.

Solutions please

In the <system.net> section of machine.config you can set the maxConnection value to indicate how many concurrent connections you want to allow. By default this is set to * and 2, which means that we will have a maximum of 2 simultaneous connections to any given address/port.

Each service point handles a particular URI, so in essence this means that any given service point will have a maximum of 2 connections.

<connectionManagement>
    <add address="*" maxconnection="2"/>
</connectionManagement>

You can either change the general maxconnection setting or add one for the particular ipaddress / server name where your web service resides.

Along with setting this up, you should also look over the numbers for maxWorkerThreads/maxIOThreads, minFreeThreads and minLocalRequestFreeThreads when calling web services, per KB821268, but the biggest contention point is definitely maxconnections.

Better yet, in 2.0 a new machine config setting was introduced, and made the default…

<ProcessModel>
  <autoconfig="true">
</ProcessModel>

This automatically configures these values for you per the article above

Wrap up

I know that in this post in particular I might have made huge leaps of logic in some places, my college algebra teacher definitely wouldn’t have been proud :) but I want to defend myself by saying that I am trying to show debugging in a step-by-step fashion, when really a lot of the time, when I am debugging, I use the “poke around until I find something interesting” technique that I mentioned earlier…

So far the posts have touched as distant topics as garbage collection, pinvokes, xmlserialization, asp.net intrinsics, and now connection management. It is impossible to know how everything is implemented which is why I am trying to post case studies on a lot of different areas so at least you will know bits and pieces of how things work and build upon that.

Have fun out there!