On 10/10/2012 06:47 AM, Dan Berindei wrote:
Hi Scott
On Wed, Oct 10, 2012 at 6:20 AM, Scott Marlow <smarlow(a)redhat.com
<mailto:smarlow@redhat.com>> wrote:
I'm trying to understand more about whether it makes sense for a
DefaultConsistentHash to be created with a non-local owner specified in
the DefaultConsistentHash constructor "segmentOwners" parameter.
It definitely makes sense for such a DefaultConsistentHash to exist
while the cache is starting. But by the time the cache has started (i.e.
getCache() has returned), it should have been replaced with a
DefaultConsistentHash that contains the local node as well.
During some AS7 cluster testing that I'm running on my machine, I'm
seeing the test stall because we loop endlessly in
KeyAffinityServiceImpl.getKeyForAddress(). We loop because
KeyAffinityServiceImpl.generateKeys() doesn't add any keys.
We don't generate any keys because
DefaultConsistentHash.locatePrimaryOwnerForSegment() returns address
"node-1/web" which never matches the local nodes filter
(KeyAffinityServiceImpl.interestedInAddress() only filters for local
owners via "node-0/web").
http://pastie.org/5027574 shows the call stack for the
DefaultConsistentHash constructor that is the same instance that is used
above. If you look at the call stack, it looks like the
DefaultConsistentHash instance may of being serialized on the other node
and sent over (which would explain why its owner is "node-1/web" but
still not sure why/how it comes into play with local
KeyAffinityServiceImpl.generateKeys()).
My guess is you're able to access the cache before it has finished
starting, and the KeyAffinityService doesn't know how to deal with a
cache that doesn't have any local state yet. Again, this should not
I instrumented the DefaultConsistentHash constructor to call
thread.dumpStack() only if the owner is "node-1/web" (so I could track
the origin of the wrong DefaultConsistentHash instance being used).
Currently, I also have INFO level logging in the DefaultConsistentHash
ctor that always shows:
"
DefaultConsistentHash ctor this=DefaultConsistentHash{numSegments=1,
numOwners=2, members=[node-1/web, node-0/web], segmentOwners={0: 0
1}system identityHashCode=108706475,show segmentOwners[0 of 1] =
[node-1/web, node-0/web]
DefaultConsistentHash ctor numSegments=1, numOwners=2
DefaultConsistentHash ctor this.segmentOwners[0][0] = node-1/web
"
Since this testing involves multiple tests
(org.jboss.as.test.clustering.cluster.singleton.SingletonTestCase,org.jboss.as.test.clustering.cluster.web.ReplicationWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.GranularWebFailoverTestCase,org.jboss.as.test.clustering.cluster.web.passivation.SessionBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.passivation.AttributeBasedSessionPassivationTestCase,org.jboss.as.test.clustering.cluster.web.DistributionWebFailoverTestCase),
its not surprising to see that we reach the DefaultConsistentHash
constructor 12 times. The segment owners for the 12 constructors are in
the following:
1. [node-0/web]
2. [node-0/web]
3. [node-0/web, node-1/web]
4. [node-0/web, node-1/web]
5. [node-1/web]
6. [node-1/web]
7. [node-1/web]
8. [node-1/web, node-0/web]
9. [node-1/web, node-0/web]
10. [node-1/web, node-0/web] (we use this one when stuck in a loop)
11. [node-0/web]
12. [node-0/web]
We are using the #10 DefaultConsistentHash constructed instance from
above for several minutes to an hour (if I let the test run that long
while the KeyAffinityServiceImpl.getKeyForAddress() continues in the loop).
Could there be a problem with the ordering of the segment owners? or is
it more that we never switch to use #11/#12 that is likely to be the
timing problem?
happen - getCache() should not return that soon - but it could be
that
it does happen when multiple threads try to start the same cache in
parallel. Can you post logs with TRACE enabled for org.infinispan and/or
a link to your test code?
Sure, I can enable TRACE for Infinispan and attach the logs to the
ISPN-2376 jira. I'll add links to the test code there as well (as
comments).
Also, KeyAffinityServiceImpl.generateKeys() contains:
"
// if we had too many misses, just release the lock and try again
if (missCount < maxMisses) {
"
I tried changing the above to a ">=" check and also tried removing the
check (just did the keyProducerStartLatch.close()) neither of which had
direct impact on the current problem.
Cheers
Dan
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev