[infinispan-issues] [JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache

Thu Jan 10 10:51:08 EST 2013

    [ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744890#comment-12744890 ] 

Radim Vansa commented on ISPN-2697:
-----------------------------------

Bela: I was looking on this code and just wanted do be sure I understand it correctly. The code you posted causes one gossip to be sent in average every (desired_avg_gossip * clusterSize) milliseconds... I don't want to compute the exact mean delay of last gossip, but it's generally proportional to the cluster size = with big cluster, long time.

I suppose there aren't many messages in the replicated __hotrodTopologyCache as it is just filled from each node when it joins the cluster and then the values are not changed. So when the size-based stability does not kick in, the seqno-based one sends stability only every 5 minutes or more. I understand we want to save bandwidth, but this is really a long time, most timeouts are shorter than this (like some which expects some event after we bcast the message - not exactly the response).

Is this really desired behaviour?

> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
>                 Key: ISPN-2697
>                 URL: https://issues.jboss.org/browse/ISPN-2697
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Remote protocols
>    Affects Versions: 5.2.0.Beta6
>            Reporter: Radim Vansa
>            Assignee: Galder Zamarreño
>            Priority: Critical
>             Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira