Radim, do these problems happen with the HotRod server, or only with memcached?
HotRod requests handled by non-owners should be very rare, instead the vast majority should be handled by the primary owner directly. So if this happens with HotRod, we should focus on fixing the HotRod routing instead of focusing on how to handle a large number of requests from non-owners.
That being said, even if a HotRod put request is handled by the primary owner, it "generates" (numOwners - 1) extra OOB requests. So if you have 160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per node. Multiply that by 2, because responses are OOB as well, and you can get 1280 OOB messages before you even start reusing any HotRod worker thread. Have you tried decreasing the number of HotRod workers?
The thing is, our OOB thread pool can't use queueing because we'd get a queue full of commit commands while all the OOB threads are waiting on keys that those commit commands would unlock. As the OOB thread pool is full, we discard messages, which I suspect slows things down quite a bit (especially if it's a credit request/response message). So it may well be that a lower number of HotRod working threads would perform better.
On the other hand, why is increasing the number of OOB threads a solution? With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory (the actual used memory is probably even less, unless you're using huge pages). AFAIK the Linux kernel doesn't break a sweat with 100000 threads running, so having 2000 threads just hanging around, waiting for a response, should be such a problem.
I did chat with Bela (or was it a break-out session?) about moving Infinispan's request
processing to another thread pool during the team meeting in Palma. That
would leave the OOB thread pool free to receive response messages, FD
heartbeats, credit requests/responses etc. The downside, I guess, is that each request would have to be passed to another thread, and the context switch may slow things down a bit. But since the new thread pool would be in Infinispan, we could even do tricks like executing a commit/rollback directly on the OOB thread.
In the end, I just didn't feel that working on this was justified, considering the number of critical bugs we had. But maybe now's the time to start experimenting...