Hi,
We are moving internal thread discussion to the forum so everyone can contribute and
discuss.
Discovery:
While clearing up remaining unit test failures and getting ready for 2.0 GA release we
noticed transient StateTransferConcurrencyTest#testConcurrentUseSync failure. This
discovery has implications beyond this test and concerns concurrent region activation
(state transfer) and high invocation load. testConcurrentUseSync does bunch of synchronous
cache invocations on five cache instances while one of the caches activates regions on
itself.
Problem description:
Lets say we have five cache instances A,B,C,D, and E. A is the instances that does
region.activate() for regions under which B,C,D, and E do synchronous put invocations.
These invocations and region.activate() are concurrent. Here is what happens. At the
moment FLUSH is started at A (in order to do state transfer triggered by region.activate)
one of the members (B,C,D, and/or E) invokes cache.put. Lets say D. This cache put
invocation, say CP, coming from D goes down D's stack before channel gets blocked at
D. CP arrives at A and goes up the stack at A. At that moment FLUSH proceeded and is
already blocking down on all channels. Invocation response for CP never returns from A
since channels are blocked. After careful observation we have concluded that until CP
returns from A (response blocked down due to flush) any subsequent mcast messages from D
will not arrive at FLUSH protocol level at A. This is not a JGroups bug but a valid
property related to FIFO message delivery. This unfortunately includes STOP_FLUSH_OK
messages and thus FLUSH cannot complete gracefully at A until CP returns. Finally, we run
into timeouts, test starts barfing TimeoutExceptions and fails.
Solution:
We have to augment FLUSH slightly. Have a look at the semantics of FLUSH in the link
below. Currently we have one semaphore B (depicted in diagram as "wait on
FLUSH.down()") that is activated once each channel receives all FLUSH_OK messages
from all channels. We will introduce another semaphore A that will block down *only* non
JGroups threads i.e application threads. Here are the details:
Semaphore A: When each channel gets START_FLUSH message do not allow user/application
threads to call channel.down(). Upon switching on semaphore A, JGroups thread that
percolated up START_FLUSH, travels up to application level and carries BLOCK
even/callback. This JGroups thread then can do any necessary cleanup or whatever and can
even send messages back down the stack because semaphore B has not been activated yet.
Semaphore B: keeps current semantics. When each channel finishes first round of FLUSH
(FLUSH_OK) do not allow any threads to call channel.down()
So how does this work for the problem above? FLUSH_OK round is strictly guaranteed to be
sent after any applications messages so all synchronous calls have to unwind and return
because FLUSH_OK will flush them. In the above problem description CP is guaranteed to
unwind. Solution we had before relied on a good will of application that it will not send
any more messages after it receives BLOCK event. If application disobeyed like JBC did we
have a race condition of application message and FLUSH_OK. If application message was sent
after FLUSH_OK we get the problem from above.
As a side note, we could have solved this in JBC but having JBC follow FLUSH BLOCK
semantics - however that solution looked prohibitively expensive performance wise.
Currently we have only semaphore B. We originally had B in the place of A but have since
moved it to B so we can have a BLOCK mechanism notification that allows solution of
JBCACHE-315. This proposal with two semaphores will still leave room to solve JBCACHE-315
because BLOCK event travels up the stack on JGroups thread when START_FLUSH is received.
So any work has to be done on that thread and channel can potentially send messages down
the stack before semaphore B kicks in.
Brian had concerns about implications of semaphore A on a solution for JBCACHE-315 and
rightly so. JBCACHE-315 solution involves the following:
1) JGroups thread that percolated up START_FLUSH, travels up
to application level and invokes block() callback.
2) JBC block() impl involves
a) setting some flag/latch to prevent new tx's accessing the cache and thus causing
unreplicated state changes. (We should deal with non-tx write calls as well.)
b) Monitoring existing tx's, giving them a chance to complete.
c) Rolling back tx's that don't complete.
2b and 2c involve letting application threads send messages (PREPARE/COMMIT/ROLLBACK
RPCs). If JGroups is going to be blocking those messages on semaphore A, we're stuck.
So we have to also think of a solution for JBCACHE-315 concurrently so to speak. Maybe the
algorithm for JBCACHE-315 can be simplified somehow. What if we simply turn the latch on
to prevent new txs accessing cache and send rollback message on JGroups thread for all txs
that are in progress? The cost is more rollbacks but we get strict consistency and
simplicity.
Lets hear your suggestions!
http://wiki.jboss.org/wiki/attach?page=JGroupsFLUSH%2Fflushdiagram-state-...
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4068012#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...