Re: [infinispan-dev] Question about ISPN-2376 "KeyAffinityServiceImpl.getKeyForAddress() seems to loop forever when DefaultConsistentHash is created for the non-local node owner"

Wednesday, 10 October 2012

On 10/10/2012 06:47 AM, Dan Berindei wrote:
...
 Hi Scott

 On Wed, Oct 10, 2012 at 6:20 AM, Scott Marlow <smarlow(a)redhat.com
 <mailto:smarlow@redhat.com>> wrote:

     I'm trying to understand more about whether it makes sense for a
     DefaultConsistentHash to be created with a non-local owner specified in
     the DefaultConsistentHash constructor "segmentOwners" parameter.

 It definitely makes sense for such a DefaultConsistentHash to exist
 while the cache is starting. But by the time the cache has started (i.e.
 getCache() has returned), it should have been replaced with a
 DefaultConsistentHash that contains the local node as well.

     During some AS7 cluster testing that I'm running on my machine, I'm
     seeing the test stall because we loop endlessly in
     KeyAffinityServiceImpl.getKeyForAddress().  We loop because
     KeyAffinityServiceImpl.generateKeys() doesn't add any keys.

     We don't generate any keys because
     DefaultConsistentHash.locatePrimaryOwnerForSegment() returns address
     "node-1/web" which never matches the local nodes filter
     (KeyAffinityServiceImpl.interestedInAddress() only filters for local
     owners via "node-0/web").

     http://pastie.org/5027574 shows the call stack for the
     DefaultConsistentHash constructor that is the same instance that is used
     above.  If you look at the call stack, it looks like the
     DefaultConsistentHash instance may of being serialized on the other node
     and sent over (which would explain why its owner is "node-1/web" but
     still not sure why/how it comes into play with local
     KeyAffinityServiceImpl.generateKeys()).

 My guess is you're able to access the cache before it has finished
 starting, and the KeyAffinityService doesn't know how to deal with a
 cache that doesn't have any local state yet. Again, this should not 
I instrumented the DefaultConsistentHash constructor to call 
thread.dumpStack() only if the owner is "node-1/web" (so I could track 
the origin of the wrong DefaultConsistentHash instance being used).

Currently, I also have INFO level logging in the DefaultConsistentHash 
ctor that always shows:

"
DefaultConsistentHash ctor this=DefaultConsistentHash{numSegments=1, 
numOwners=2, members=[node-1/web, node-0/web], segmentOwners={0: 0 
1}system identityHashCode=108706475,show segmentOwners[0 of 1] = 
[node-1/web, node-0/web]

DefaultConsistentHash ctor numSegments=1, numOwners=2

DefaultConsistentHash ctor this.segmentOwners[0][0] = node-1/web
"

Since this testing involves multiple tests 
(org.jboss.as.test.clustering.cluster.singleton.SingletonTestCase,org.jboss.as.test.clustering.cluster.web.ReplicationWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.GranularWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.passivation.SessionBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.passivation.AttributeBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.DistributionWebFailoverTestCase),

its not surprising to see that we reach the DefaultConsistentHash 
constructor 12 times.  The segment owners for the 12 constructors are in 
the following:

1.  [node-0/web]
2.  [node-0/web]
3.  [node-0/web, node-1/web]
4.  [node-0/web, node-1/web]
5.  [node-1/web]
6.  [node-1/web]
7.  [node-1/web]
8.  [node-1/web, node-0/web]
9.  [node-1/web, node-0/web]
10. [node-1/web, node-0/web] (we use this one when stuck in a loop)
11. [node-0/web]
12. [node-0/web]

We are using the #10 DefaultConsistentHash constructed instance from 
above for several minutes to an hour (if I let the test run that long 
while the KeyAffinityServiceImpl.getKeyForAddress() continues in the loop).

Could there be a problem with the ordering of the segment owners?  or is 
it more that we never switch to use #11/#12 that is likely to be the 
timing problem?

...
 happen - getCache() should not return that soon - but it could be
that
 it does happen when multiple threads try to start the same cache in
 parallel. Can you post logs with TRACE enabled for org.infinispan and/or
 a link to your test code?

Sure, I can enable TRACE for Infinispan and attach the logs to the 
ISPN-2376 jira.  I'll add links to the test code there as well (as 
comments).

Also, KeyAffinityServiceImpl.generateKeys() contains:

"
// if we had too many misses, just release the lock and try again
if (missCount < maxMisses) {
"

I tried changing the above to a ">=" check and also tried removing the 
check (just did the keyProducerStartLatch.close()) neither of which had 
direct impact on the current problem.

...
 Cheers
 Dan

 _______________________________________________
 infinispan-dev mailing list
 infinispan-dev(a)lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [infinispan-dev] Question about ISPN-2376 "KeyAffinityServiceImpl.getKeyForAddress() seems to loop forever when DefaultConsistentHash is created for the non-local node owner"