[infinispan-issues] [JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache

Thu Jan 24 06:27:47 EST 2013

    [ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750177#comment-12750177 ] 

Bela Ban commented on ISPN-2697:
--------------------------------

Re Dan's comment at 15/Jan/13 2:07 PM:

Why is tagging a message as RSVP hacky ? Infinispan does tag certain RPCs with RSVP, so why is that not hacky, and marking this RPC *is* hacky ?

The suggestion to increase repl-timeout to be greater than desired_avg_gossip * cluster-size won't work IMO, because this might be a long timeout.

With desired_avg_gossip * cluster-size, we're getting roughly the same number of STABLE messages in the cluster, regardless of the cluster size, but at the expense of STABILITY messages being sent less frequently with increasing cluster size.

The reason I did it that way is that the STABLE protocol is mainly used to garbage collect messages received by everyone. When many messages are sent, the size based STABLE kicks in. When only few messages are sent, I wanted to keep the STABLE induced messages to a minimum.

Using STABLE to learn of the highest message seqnos received is a functionality that's piggy backed on STABLE messages, but that's not the main priority.

If you guys think that the way desired_avg_gossip * cluster-size is computed is bad, I can always change it to enforce an upper bound on *STABILITY* rather then *STABLE* messages, but that means more STABLE-related messages with increasing cluster size.
WDYT ?

> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
>                 Key: ISPN-2697
>                 URL: https://issues.jboss.org/browse/ISPN-2697
>             Project: Infinispan
>          Issue Type: Bug
>          Components: Remote protocols
>    Affects Versions: 5.2.0.Beta6
>            Reporter: Radim Vansa
>            Assignee: Galder Zamarreño
>            Priority: Critical
>             Fix For: 5.2.0.Final
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira