[infinispan-issues] [JBoss JIRA] (ISPN-6402) Default GMS.join_timeout is too long
Dan Berindei (JIRA)
issues at jboss.org
Fri Mar 18 09:26:01 EDT 2016
[ https://issues.jboss.org/browse/ISPN-6402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Berindei updated ISPN-6402:
-------------------------------
Status: Open (was: New)
> Default GMS.join_timeout is too long
> ------------------------------------
>
> Key: ISPN-6402
> URL: https://issues.jboss.org/browse/ISPN-6402
> Project: Infinispan
> Issue Type: Task
> Components: Core, Server, Test Suite - Server
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Minor
>
> {{GMS.join_timeout}} is used by JGroups for two purposes:
> # Wait for {{FIND_INITIAL_MBRS}} responses. If other nodes are running, but they don't answer within {{join_timeout}} ms, the node will start a new partition by itself.
> # If no other nodes are running when the request is sent, but another node starts and sends its own discovery request within {{join_timeout}}, the initial cluster view will contain both nodes, but this isn't really useful in Infinispan (we have {{gcb.transport().initialClusterSize()}} instead).
> # Once a coordinator is located, the node sends a join request and waits for a response for {{join_timeout}} ms. After a timeout, the node re-sends the join request (up to a maximum of {{max_join_attempts}}, which defaults to 10).
> The default {{GMS.join_timeout}} in Infinispan is 15000, vs. 2000 in JGroups (actually 3000 in {{GMS}} itself, but 2000 in the example configurations).
> The higher timeout will only help us when a node is running, but it's inaccessible (e.g. because of a long GC) at the exact time a node is joining. I'd argue that applications that can tolerate multi-second pauses would be better served by {{gcb.transport().initialClusterSize(2)}} and/or an external discovery mechanism (e.g. {{FILE_PING}}, or something based on the WildFly domain controller). For most applications, the current default means just a 15s delay every time the cluster is (re)started.
> In particular, because our integration tests use the default configuration, it means a delay of 15s for every test that starts a cluster.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
More information about the infinispan-issues
mailing list