I spent the majority of the day looking into a WCF issue with our system whereby the same request submitted in bulk against the same WCF service would fail indeterminately:-
System.ServiceModel.EndpointNotFoundException: Could not connect to http://localhost:8000/MyService. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8000
I had noticed by log inspection that about 10 of the requests would succeed and the majority of the others would fail, this led me to believe that there was a default setting somewhere inside the WCF configuration that was having a throttling effect. When I finally got to the bottom of the issue I was somewhat annoyed with myself as it was a classic socket issue - not a WCF specific one. Therein I guess lies the problem with dealing with ever higher layers of abstraction – you begin to forget about the low-level details that underpin it (and which eventually leaks through). Fortunately my copy of Programming WCF Services by Juval Lowy arrived the other day so I had a chance to do some reading to see what possible setting could be having an effect.
ServiceThrottling “maxConcurrent*'”
Lowy dedicates an entire chapter to service request throttling. There are 3 settings:- maxConcurrentSessions, maxConcurrentCalls and maxConcurrentInstances described for the Service Throttling Behaviour that allow the maximum load on each service to be controlled. The effects of these settings are dependent on the transport and concurrency mode of the service, but it’s all spelt out clearly in the book. The most interesting of the lot for me was the number of concurrent sessions - which defaults to 10 under a Per-Session model which is what we are using.
I bumped the setting up to 1000 (way over the 100 needed to service my test harness load), but it had no effect. To humour myself I dropped all three settings to 1 to ensure it had some impact. It did. Oh.
NetTcpBinding “MaxConnections”
Back to the drawing board, or more accurately Google and the book. So convinced was I that I had discovered the cause that I never bothered to read on to the end of the chapter, where I would have discovered another setting that limits the maximum number of TCP connections for the binding:- maxConnections. This setting, which also has a default of 10, goes hand-in-hand with the previous settings as the smaller of it and maxConcurrentSessions becomes the effective throttle.
Once again I changed my service config only to become immediately disappointed as it also failed to have the desired result.
NetTcpBinding “listenBacklog”
Switching back to Google I read various posts about firewall issues which didn’t apply as I was running on the same machine. However I did start to pay closer attention to some of the settings that were being included in the App.config files. One in particular caught my eye – listenBacklog. Anyone who has done any raw sockets programming will know that when you create a server-side socket and start listening on it you need to specify how many pending connections you’re willing to buffer. I quickly Googled again to see what the default value was – yup another 10!
I quickly plugged my ridiculous value into the .config file and bingo! This time it worked. So, a new twist on a very old problem…
Having this exact issue. I think you may have helped us out with ListenBacklog setting. Thanks!
ReplyDeleteThis is an EXCELLENT find. If I'm not mistaken, though, hitting the listenBacklog limit means that the WCF engine itself is simply not able to issue accept() calls (and presumably assign the corresponding incoming request to a thread) fast enough to prevent the backlog from being hit. What would cause such a WCF failure? Processor load, or some kind of thread starvation, or...? Thanks!
ReplyDeleteNope, as I mentioned above this is not a WCF specific setting - it's a TCP/IP setting. When you call listen() on the listening socket you have to say how big the queue is for pending connect requests.
DeleteWCF should be able to accept() a connection request very quickly but you still need a queue to handle multiple connect requests as it still takes some time to process them. In the case above we were making 100 simultaneous connect requests (give or take a few hundred ms here and there).
I don't know what the architecture of WCF is. It's possible that WCF does an accept(), spawns a thread, then moves on. Or it may use a dedicated thread (or work stealing from a thread pool) just for handling accept()s. Whatever it does we maxed it out :-).
It's been way too long since I read up on how the TCP/IP works... I've just checked the Winsock Programmer's FAQ to remind myself what the backlog is for:
Deletehttp://tangentsoft.net/wskfaq/advanced.html#backlog
Where on the earth the ListenBacklog did any difference? why don't you put an example show me how did the listen backlog change anything?
ReplyDeleteLowy is funny by the way, he complicated WCF in a very funny way!
Thank you! That helped me a lot.
ReplyDeleteHI, even if I am having a huge value of ListenBackLog=10000 and MaxConnection=10000 then also I am getting "tcp error code 10061 target machine actively refused". Its not every time but when load increases then error start appearing and one service is not able to communicate with other.
ReplyDeleteFollowing are the other values,
tempBinding.ListenBacklog = 10000;
tempBinding.MaxBufferPoolSize =
500000;
tempBinding.MaxBufferSize =
2000000000;
tempBinding.MaxConnections = 10000;
tempBinding.MaxReceivedMessageSize =
2000000000;