Hi Scott

On Wed, Oct 10, 2012 at 6:20 AM, Scott Marlow <smarlow@redhat.com> wrote:
I'm trying to understand more about whether it makes sense for a
DefaultConsistentHash to be created with a non-local owner specified in
the DefaultConsistentHash constructor "segmentOwners" parameter.


It definitely makes sense for such a DefaultConsistentHash to exist while the cache is starting. But by the time the cache has started (i.e. getCache() has returned), it should have been replaced with a DefaultConsistentHash that contains the local node as well.

 
During some AS7 cluster testing that I'm running on my machine, I'm
seeing the test stall because we loop endlessly in
KeyAffinityServiceImpl.getKeyForAddress().  We loop because
KeyAffinityServiceImpl.generateKeys() doesn't add any keys.

We don't generate any keys because
DefaultConsistentHash.locatePrimaryOwnerForSegment() returns address
"node-1/web" which never matches the local nodes filter
(KeyAffinityServiceImpl.interestedInAddress() only filters for local
owners via "node-0/web").

http://pastie.org/5027574 shows the call stack for the
DefaultConsistentHash constructor that is the same instance that is used
above.  If you look at the call stack, it looks like the
DefaultConsistentHash instance may of being serialized on the other node
and sent over (which would explain why its owner is "node-1/web" but
still not sure why/how it comes into play with local
KeyAffinityServiceImpl.generateKeys()).


My guess is you're able to access the cache before it has finished starting, and the KeyAffinityService doesn't know how to deal with a cache that doesn't have any local state yet. Again, this should not happen - getCache() should not return that soon - but it could be that it does happen when multiple threads try to start the same cache in parallel. Can you post logs with TRACE enabled for org.infinispan and/or a link to your test code?

Cheers
Dan