[infinispan-dev] Random SocketTimeoutExceptions demonstrate we're not using Netty the right way

Galder Zamarreño galder at redhat.com
Fri Mar 20 11:22:33 EDT 2015


Hi all,

Summary: I've been debugging [1] and found that an unexpected operation throwing a SocketTE was being thrown due to the worker thread pool being too small, but the real problem is the fact that we can block within Netty's worker thread, something we're hoping to fix with Gustavo's work to implement [2].

The test in [1] was failing randomly as a result of this sequence of events:

1. Server configured with worker thread pool being 2 (in code, this is 2 * Runtime.getRuntime().availableProcessors())
2. We emulate a server side operation hanging with a sleep injected by an interceptor.
3. Client gets SocketTimeoutException and retries the operation.
4. The retried operation hangs because it cannot acquire the lock.
5. Client gets SocketTimeoutException again, runs out of retries.
6. Test now executes an operation that should not throw a SocketTE but it gets it because the 2 worker threads are in use. The first is in the sleep, and the retried one waiting for the lock, so it cannot make any progress and hence ends up getting a SocketTE.

You might think this can be solved by increasing the default worker thread pool size but you're just moving the problem around. You could have N operations that hang at the same time, e.g. if there's a long GC pause.

So really, we need to stop blocking within the worker thread and implement [2].

I think we might have seen this issue already back in Infinispan 5, because we were already getting SocketTEs back then in another SocketTE test [3]. I wasn't able to replicate it locally at the time and solved it by ignoring it :|

As far as ISPN-5314 is concerned, it can easily be solved by increasing the worker thread pool for that particular test, but we should revert that back when [2] is implemented.

Cheers,

[1] https://issues.jboss.org/browse/ISPN-5314
[2] https://issues.jboss.org/browse/ISPN-5083
[3] https://issues.jboss.org/browse/ISPN-2110
--
Galder Zamarreño
galder at redhat.com







More information about the infinispan-dev mailing list