On 20 Mar 2015, at 18:32, Dan Berindei <dan.berindei(a)gmail.com>
wrote:
Hi Galder
Using cache.putAsync on the server will use a thread from the "async"
transport executor. While the default number of threads in the
transport executor is higher (25), it's still going to be exhausted at
some point as you're generating more and more duplicate operations
from the client.
^ Actually, I discovered that I had 2 as worker threads because the testsuite was
overriding the default and setting it to 2.
Using async operations will be a better solution once the blocked
operations don't need to keep a thread busy in core. But ATM I believe
a better solution would be to increase the socket timeout on the
client so that we get a TimeoutException from the server before we get
the SocketTimeoutException on the client.
Indeed, currently socket timeout needs to be aligned with the rest of configuration
details, and we do often recommend increasing it in systems under load, but the point is
that in the current set up, server worker threads numbers also need to be aligned
correctly.
Cheers,
Cheers
Dan
On Fri, Mar 20, 2015 at 5:22 PM, Galder Zamarreño <galder(a)redhat.com> wrote:
> Hi all,
>
> Summary: I've been debugging [1] and found that an unexpected operation throwing
a SocketTE was being thrown due to the worker thread pool being too small, but the real
problem is the fact that we can block within Netty's worker thread, something
we're hoping to fix with Gustavo's work to implement [2].
>
> The test in [1] was failing randomly as a result of this sequence of events:
>
> 1. Server configured with worker thread pool being 2 (in code, this is 2 *
Runtime.getRuntime().availableProcessors())
> 2. We emulate a server side operation hanging with a sleep injected by an
interceptor.
> 3. Client gets SocketTimeoutException and retries the operation.
> 4. The retried operation hangs because it cannot acquire the lock.
> 5. Client gets SocketTimeoutException again, runs out of retries.
> 6. Test now executes an operation that should not throw a SocketTE but it gets it
because the 2 worker threads are in use. The first is in the sleep, and the retried one
waiting for the lock, so it cannot make any progress and hence ends up getting a
SocketTE.
>
> You might think this can be solved by increasing the default worker thread pool size
but you're just moving the problem around. You could have N operations that hang at
the same time, e.g. if there's a long GC pause.
>
> So really, we need to stop blocking within the worker thread and implement [2].
>
> I think we might have seen this issue already back in Infinispan 5, because we were
already getting SocketTEs back then in another SocketTE test [3]. I wasn't able to
replicate it locally at the time and solved it by ignoring it :|
>
> As far as ISPN-5314 is concerned, it can easily be solved by increasing the worker
thread pool for that particular test, but we should revert that back when [2] is
implemented.
>
> Cheers,
>
> [1]
https://issues.jboss.org/browse/ISPN-5314
> [2]
https://issues.jboss.org/browse/ISPN-5083
> [3]
https://issues.jboss.org/browse/ISPN-2110
> --
> Galder Zamarreño
> galder(a)redhat.com
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/infinispan-dev
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
--
Galder Zamarreño
galder(a)redhat.com