[infinispan-dev] Hashing generating recipient lists with same address

Galder Zamarreno galder at jboss.org
Fri May 7 09:14:23 EDT 2010


I will create one as part of https://jira.jboss.org/jira/browse/ISPN-428

----- "Manik Surtani" <manik at jboss.org> wrote:

> So TopologyChangeTest is a pretty complex test involving HotRod
> clients and servers, etc.  Can this be reproduced in a simpler setting
> - i.e., 2 p2p Infinispan instances, add a third, etc., without any
> HotRod components?
> 
> On 6 May 2010, at 17:51, galder at redhat.com wrote:
> 
> > Hi all,
> > 
> > As indicated on IRC, running
> org.infinispan.client.hotrod.TopologyChangeTest.testTwoMembers() fails
> randomly with replication timeout. It's very easy to replicate. When
> it fails, this is what happens:
> > 
> > 1. During rehashing, a new hash is installed:
> > 2010-05-06 17:54:11,960 4932  TRACE
> [org.infinispan.distribution.DistributionManagerImpl]
> (Rehasher-eq-985:) Installing new consistent hash
> DefaultConsistentHash{addresses ={109=eq-35426, 10032=eq-985,
> 10033=eq-985}, hash space =10240}
> > 
> > 2. Rehash finishes and the previous hash is still installed:
> > 2010-05-06 17:54:11,978 4950  INFO 
> [org.infinispan.distribution.JoinTask] (Rehasher-eq-985:) eq-985
> completed join in 30 milliseconds!
> > 
> > 3. A put comes in to eq-985 who decides recipients are [eq-985,
> eq-985]. Most likely, the hash falled somewhere between 109 and 10032
> and since owners are 2, it took the next 2:
> > 2010-05-06 17:54:12,307 5279  TRACE
> [org.infinispan.remoting.rpc.RpcManagerImpl] (HotRodServerWorker-2-1:)
> eq-985 broadcasting call
> PutKeyValueCommand{key=CacheKey{data=ByteArray{size=9,
> hashCode=d28dfa, array=[-84, -19, 0, 5, 116, 0, 2, 107, 48, ..]}},
> value=CacheValue{data=ByteArray{size=9, array=[-84, -19, 0, 5, 116, 0,
> 2, 118, 48, ..]}, version=281483566645249}, putIfAbsent=false,
> lifespanMillis=-1000, maxIdleTimeMillis=-1000} to recipient list
> [eq-985, eq-985]
> > 
> > Everything afterwards is a mess:
> > 
> > 4. JGroups removes the local address from the destination. The
> reason Infinispan does not do it it's because the number of recipients
> is 2 and the number of members in the cluster 2, so it thinks it's a
> broadcast:
> > 2010-05-06 17:54:12,308 5280  TRACE
> [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
> (HotRodServerWorker-2-1:) real_dests=[eq-985]
> > 
> > 5. JGroups still sends it as a broadcast:
> > 2010-05-06 17:54:12,308 5280  TRACE [org.jgroups.protocols.TCP]
> (HotRodServerWorker-2-1:) sending msg to null, src=eq-985, headers are
> RequestCorrelator: id=201, type=REQ, id=12, rsp_expected=true, NAKACK:
> [MSG, seqno=5], TCP: [channel_name=Infinispan-Cluster]
> > 
> > 6. Another node deals with this and replies:
> > 2010-05-06 17:54:12,310 5282  TRACE
> [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
> (OOB-1,Infinispan-Cluster,eq-35426:) Attempting to execute command:
> SingleRpcCommand{cacheName='___defaultcache',
> command=PutKeyValueCommand{key=CacheKey{data=ByteArray{size=9,
> hashCode=43487e, array=[-84, -19, 0, 5, 116, 0, 2, 107, 48, ..]}},
> value=CacheValue{data=ByteArray{size=9, array=[-84, -19, 0, 5, 116, 0,
> 2, 118, 48, ..]}, version=281483566645249}, putIfAbsent=false,
> lifespanMillis=-1000, maxIdleTimeMillis=-1000}} [sender=eq-985]
> > ...
> > 
> > 7. However, no replies yet from eq-985, so u get:
> > 2010-05-06 17:54:27,310 20282 TRACE
> [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher]
> (HotRodServerWorker-2-1:) responses: [sender=eq-985, retval=null,
> received=false, suspected=false]
> > 
> > 2010-05-06 17:54:27,313 20285 TRACE
> [org.infinispan.remoting.rpc.RpcManagerImpl] (HotRodServerWorker-2-1:)
> replication exception: 
> > org.infinispan.util.concurrent.TimeoutException: Replication timeout
> for eq-985
> > 
> > Now, I don't understand the reason for creating a hash 10032=eq-985,
> 10033=eq-985. Shouldn't keeping 10032=eq-985 be enough? Why add
> 10033=eq-985?
> > 
> > Assuming there was a valid case for it, a naive approach would be to
> discard a second node that points to the an address already in the
> recipient list. So, 10032=eq-985 would be accepted for the list but
> when encountering 10033=eq-985, this would be skipped.
> > 
> > Finally, I thought waiting for rehashing to finish would solve the
> issue but as u can see in 2., rehashing finished and the hash is still
> in the same shape. Also, I've attached a log file.
> > 
> > Cheers,
> > --
> > Galder Zamarreño
> > Sr. Software Engineer
> > Infinispan, JBoss Cache
> >
> <bad2_jgroups-infinispan.log.zip>_______________________________________________
> > infinispan-dev mailing list
> > infinispan-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev



More information about the infinispan-dev mailing list