[infinispan-dev] Question about ISPN-2376 "KeyAffinityServiceImpl.getKeyForAddress() seems to loop forever when DefaultConsistentHash is created for the non-local node owner"
Scott Marlow
smarlow at redhat.com
Wed Oct 10 09:47:43 EDT 2012
On 10/10/2012 06:47 AM, Dan Berindei wrote:
> Hi Scott
>
> On Wed, Oct 10, 2012 at 6:20 AM, Scott Marlow <smarlow at redhat.com
> <mailto:smarlow at redhat.com>> wrote:
>
> I'm trying to understand more about whether it makes sense for a
> DefaultConsistentHash to be created with a non-local owner specified in
> the DefaultConsistentHash constructor "segmentOwners" parameter.
>
>
> It definitely makes sense for such a DefaultConsistentHash to exist
> while the cache is starting. But by the time the cache has started (i.e.
> getCache() has returned), it should have been replaced with a
> DefaultConsistentHash that contains the local node as well.
>
> During some AS7 cluster testing that I'm running on my machine, I'm
> seeing the test stall because we loop endlessly in
> KeyAffinityServiceImpl.getKeyForAddress(). We loop because
> KeyAffinityServiceImpl.generateKeys() doesn't add any keys.
>
> We don't generate any keys because
> DefaultConsistentHash.locatePrimaryOwnerForSegment() returns address
> "node-1/web" which never matches the local nodes filter
> (KeyAffinityServiceImpl.interestedInAddress() only filters for local
> owners via "node-0/web").
>
> http://pastie.org/5027574 shows the call stack for the
> DefaultConsistentHash constructor that is the same instance that is used
> above. If you look at the call stack, it looks like the
> DefaultConsistentHash instance may of being serialized on the other node
> and sent over (which would explain why its owner is "node-1/web" but
> still not sure why/how it comes into play with local
> KeyAffinityServiceImpl.generateKeys()).
>
>
> My guess is you're able to access the cache before it has finished
> starting, and the KeyAffinityService doesn't know how to deal with a
> cache that doesn't have any local state yet. Again, this should not
I instrumented the DefaultConsistentHash constructor to call
thread.dumpStack() only if the owner is "node-1/web" (so I could track
the origin of the wrong DefaultConsistentHash instance being used).
Currently, I also have INFO level logging in the DefaultConsistentHash
ctor that always shows:
"
DefaultConsistentHash ctor this=DefaultConsistentHash{numSegments=1,
numOwners=2, members=[node-1/web, node-0/web], segmentOwners={0: 0
1}system identityHashCode=108706475,show segmentOwners[0 of 1] =
[node-1/web, node-0/web]
DefaultConsistentHash ctor numSegments=1, numOwners=2
DefaultConsistentHash ctor this.segmentOwners[0][0] = node-1/web
"
Since this testing involves multiple tests
(org.jboss.as.test.clustering.cluster.singleton.SingletonTestCase,org.jboss.as.test.clustering.cluster.web.ReplicationWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.GranularWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.passivation.SessionBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.passivation.AttributeBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.DistributionWebFailoverTestCase),
its not surprising to see that we reach the DefaultConsistentHash
constructor 12 times. The segment owners for the 12 constructors are in
the following:
1. [node-0/web]
2. [node-0/web]
3. [node-0/web, node-1/web]
4. [node-0/web, node-1/web]
5. [node-1/web]
6. [node-1/web]
7. [node-1/web]
8. [node-1/web, node-0/web]
9. [node-1/web, node-0/web]
10. [node-1/web, node-0/web] (we use this one when stuck in a loop)
11. [node-0/web]
12. [node-0/web]
We are using the #10 DefaultConsistentHash constructed instance from
above for several minutes to an hour (if I let the test run that long
while the KeyAffinityServiceImpl.getKeyForAddress() continues in the loop).
Could there be a problem with the ordering of the segment owners? or is
it more that we never switch to use #11/#12 that is likely to be the
timing problem?
> happen - getCache() should not return that soon - but it could be that
> it does happen when multiple threads try to start the same cache in
> parallel. Can you post logs with TRACE enabled for org.infinispan and/or
> a link to your test code?
>
Sure, I can enable TRACE for Infinispan and attach the logs to the
ISPN-2376 jira. I'll add links to the test code there as well (as
comments).
Also, KeyAffinityServiceImpl.generateKeys() contains:
"
// if we had too many misses, just release the lock and try again
if (missCount < maxMisses) {
"
I tried changing the above to a ">=" check and also tried removing the
check (just did the keyProducerStartLatch.close()) neither of which had
direct impact on the current problem.
> Cheers
> Dan
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
More information about the infinispan-dev
mailing list