[infinispan-dev] Hashing generating recipient lists with same address

Mircea Markus mircea.markus at jboss.com
Fri May 7 08:01:10 EDT 2010


On 7 May 2010, at 14:56, Manik Surtani wrote:

> 
> On 7 May 2010, at 12:44, Mircea Markus wrote:
> 
>> I've tried the the same operation sequence on the caches but it works without timeout. HR server also defines a cache for it's own purposes, I'll try to include that cache as well in the setup and check again.
> 
> Yes, but that is a replicated cache, right?
the actual put originating from the HR client is on an distributed cache. Hotrod Server also defines an replicated cache for its own purposes, I've also tried to include that one but still no timeouts.
> 
> 
>> 
>> On 7 May 2010, at 14:20, Manik Surtani wrote:
>> 
>>> So TopologyChangeTest is a pretty complex test involving HotRod clients and servers, etc.  Can this be reproduced in a simpler setting - i.e., 2 p2p Infinispan instances, add a third, etc., without any HotRod components?
>>> 
>>> On 6 May 2010, at 17:51, galder at redhat.com wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> As indicated on IRC, running org.infinispan.client.hotrod.TopologyChangeTest.testTwoMembers() fails randomly with replication timeout. It's very easy to replicate. When it fails, this is what happens:
>>>> 
>>>> 1. During rehashing, a new hash is installed:
>>>> 2010-05-06 17:54:11,960 4932  TRACE [org.infinispan.distribution.DistributionManagerImpl] (Rehasher-eq-985:) Installing new consistent hash DefaultConsistentHash{addresses ={109=eq-35426, 10032=eq-985, 10033=eq-985}, hash space =10240}
>>>> 
>>>> 2. Rehash finishes and the previous hash is still installed:
>>>> 2010-05-06 17:54:11,978 4950  INFO  [org.infinispan.distribution.JoinTask] (Rehasher-eq-985:) eq-985 completed join in 30 milliseconds!
>>>> 
>>>> 3. A put comes in to eq-985 who decides recipients are [eq-985, eq-985]. Most likely, the hash falled somewhere between 109 and 10032 and since owners are 2, it took the next 2:
>>>> 2010-05-06 17:54:12,307 5279  TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (HotRodServerWorker-2-1:) eq-985 broadcasting call PutKeyValueCommand{key=CacheKey{data=ByteArray{size=9, hashCode=d28dfa, array=[-84, -19, 0, 5, 116, 0, 2, 107, 48, ..]}}, value=CacheValue{data=ByteArray{size=9, array=[-84, -19, 0, 5, 116, 0, 2, 118, 48, ..]}, version=281483566645249}, putIfAbsent=false, lifespanMillis=-1000, maxIdleTimeMillis=-1000} to recipient list [eq-985, eq-985]
>>>> 
>>>> Everything afterwards is a mess:
>>>> 
>>>> 4. JGroups removes the local address from the destination. The reason Infinispan does not do it it's because the number of recipients is 2 and the number of members in the cluster 2, so it thinks it's a broadcast:
>>>> 2010-05-06 17:54:12,308 5280  TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (HotRodServerWorker-2-1:) real_dests=[eq-985]
>>>> 
>>>> 5. JGroups still sends it as a broadcast:
>>>> 2010-05-06 17:54:12,308 5280  TRACE [org.jgroups.protocols.TCP] (HotRodServerWorker-2-1:) sending msg to null, src=eq-985, headers are RequestCorrelator: id=201, type=REQ, id=12, rsp_expected=true, NAKACK: [MSG, seqno=5], TCP: [channel_name=Infinispan-Cluster]
>>>> 
>>>> 6. Another node deals with this and replies:
>>>> 2010-05-06 17:54:12,310 5282  TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-1,Infinispan-Cluster,eq-35426:) Attempting to execute command: SingleRpcCommand{cacheName='___defaultcache', command=PutKeyValueCommand{key=CacheKey{data=ByteArray{size=9, hashCode=43487e, array=[-84, -19, 0, 5, 116, 0, 2, 107, 48, ..]}}, value=CacheValue{data=ByteArray{size=9, array=[-84, -19, 0, 5, 116, 0, 2, 118, 48, ..]}, version=281483566645249}, putIfAbsent=false, lifespanMillis=-1000, maxIdleTimeMillis=-1000}} [sender=eq-985]
>>>> ...
>>>> 
>>>> 7. However, no replies yet from eq-985, so u get:
>>>> 2010-05-06 17:54:27,310 20282 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (HotRodServerWorker-2-1:) responses: [sender=eq-985, retval=null, received=false, suspected=false]
>>>> 
>>>> 2010-05-06 17:54:27,313 20285 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (HotRodServerWorker-2-1:) replication exception: 
>>>> org.infinispan.util.concurrent.TimeoutException: Replication timeout for eq-985
>>>> 
>>>> Now, I don't understand the reason for creating a hash 10032=eq-985, 10033=eq-985. Shouldn't keeping 10032=eq-985 be enough? Why add 10033=eq-985?
>>>> 
>>>> Assuming there was a valid case for it, a naive approach would be to discard a second node that points to the an address already in the recipient list. So, 10032=eq-985 would be accepted for the list but when encountering 10033=eq-985, this would be skipped.
>>>> 
>>>> Finally, I thought waiting for rehashing to finish would solve the issue but as u can see in 2., rehashing finished and the hash is still in the same shape. Also, I've attached a log file.
>>>> 
>>>> Cheers,
>>>> --
>>>> Galder Zamarreño
>>>> Sr. Software Engineer
>>>> Infinispan, JBoss Cache
>>>> <bad2_jgroups-infinispan.log.zip>_______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev at lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> 
>>> --
>>> Manik Surtani
>>> manik at jboss.org
>>> Lead, Infinispan
>>> Lead, JBoss Cache
>>> http://www.infinispan.org
>>> http://www.jbosscache.org
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev at lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev at lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> --
> Manik Surtani
> manik at jboss.org
> Lead, Infinispan
> Lead, JBoss Cache
> http://www.infinispan.org
> http://www.jbosscache.org
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev




More information about the infinispan-dev mailing list