[infinispan-dev] State transfer testing

Mircea Markus mircea.markus at jboss.com
Tue Jul 19 11:36:26 EDT 2011


On 19 Jul 2011, at 09:38, Galder Zamarreño wrote:

> Hi all,
> 
> I've noticed a problem with the way we test state transfer in our testsuite. For example, take https://github.com/infinispan/infinispan/blob/master/core/src/test/java/org/infinispan/statetransfer/StateTransferFunctionalTest.java#L159
> 
> This test checks that when a new node is started, state transfer happens. But, it could happen that a merge happens instead of a join, so if a merge happens no state transfer occurs.
that was my understanding as well. And I think it still stands for REPL.
I've just had a chat with Dan and also looked at the code[1]: for distributed caches, if a merge view happens, the rehashing is triggered in exactly the same way as when a join happens. That worries me, as consistency is affected if a key is modified in one (or both) of the cluster's partition after the split brain. Or am I missing something?
[1] http://bit.ly/o9Cx99
> 
> Now, the problem is that waiting for the view to be set happens in the main thread, and the callback to merge view listener happens in a different thread. So, in an unfortunate situation, the following can happen:
> 
> 1. [main-thread] waits for view to be set.
> 2. [main-thread] view is set due to a merge and main thread carries on.
> 3. [main-thread] checks the merge view listener and sees that 
> 4. [callback-thread] calls MergedViewListener.mergedView and sets merged=true.
> 
> I've seen this failure happening in my local machine when trying to replicate other random failures.
> 
> So, I'm solving this issue by having a listener that listens for both merge and view changes, and then having a latch that can waits for either one of the two callbacks to happen.
> Clearly, the countDown() would happen after either merged (boolean) or viewChanged(boolean) have been set, so that gives the guarantees that either a merge happened or not and then I can check the initial data if necessary. I'll send a pull req later on today with this.
> 
> Btw, you might be wondering how on earth a merge would happen with our new TEST_PING? I have that question too and seems like sometimes Discovery.sendGetMembersRequest does not get called and my TEST_PING implementation assumes that will definitely get called. I've sent an email to Bela and I'm gonna try to add some more debugging to find out how on earth this happens.
I think that even if you fix this in the unit tests it still might happen in a real-life situation, i.e. start two nodes and instead of forming a cluster they'd first form two clusters and then merge.
> 
> Now, what I'm wondering here is whether this is something the end users would be interested as well cos they might be running their own testing to check whether state transfer works as expected.
Thinking loud about this issue: can't jgroups realise that this node wants to join and not to merge? E.g. if node B starts and wants to join cluster {A}: if B hasn't received any application level messages than can't jgroups just assume that it definitely wants to join, and never wants to merge?
> 
> It's at this point that I miss tuples in java cos I think it'd be handy to have a getMembers() call that returns not only a List<Address> but also an enum value indicating whether the last view change was a merge or view change, or more simply a boolean indicating whether the last view change was a merge or not:
> 
> (boolean, List<Members>) getMembers();
> 
> Unfortunately Java does not make it easy to return things like this. Having a separate method to find out if the last view change was a merge or not would be clunky cos you'd want a single method that can provide the guarantees with regards to the member list returned.
> 
> Any thoughts on this API enhancement? Would it be useful? In the Java world, it would require creating a new type which is a bit of a deterrent for me.
I'm not sure how useful this information is *after* the event (be it merge or join) took place. Beside this very specific use case, I cannot think of another in which a user wants to know the type of view change *after* it took place..
> --
> Galder Zamarreño
> Sr. Software Engineer
> Infinispan, JBoss Cache
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev




More information about the infinispan-dev mailing list