[infinispan-issues] [JBoss JIRA] (ISPN-6402) Default GMS.join_timeout is too long

Dan Berindei (JIRA) issues at jboss.org
Fri Mar 18 09:23:00 EDT 2016


Dan Berindei created ISPN-6402:
----------------------------------

             Summary: Default GMS.join_timeout is too long
                 Key: ISPN-6402
                 URL: https://issues.jboss.org/browse/ISPN-6402
             Project: Infinispan
          Issue Type: Task
          Components: Core, Server, Test Suite - Server
            Reporter: Dan Berindei
            Assignee: Dan Berindei
            Priority: Minor


{{GMS.join_timeout}} is used by JGroups for two purposes:
# Wait for {{FIND_INITIAL_MBRS}} responses. If other nodes are running, but they don't answer within {{join_timeout}} ms, the node will start a new partition by itself. 
# If no other nodes are running when the request is sent, but another node starts and sends its own discovery request within {{join_timeout}}, the initial cluster view will contain both nodes, but this isn't really useful in Infinispan (we have {{gcb.transport().initialClusterSize()}} instead).
# Once a coordinator is located, the node sends a join request and waits for a response for {{join_timeout}} ms. After a timeout, the node re-sends the join request (up to a maximum of {{max_join_attempts}}, which defaults to 10).

The default {{GMS.join_timeout}} in Infinispan is 15000, vs. 2000 in JGroups (actually 3000 in {{GMS}} itself, but 2000 in the example configurations).

The higher timeout will only help us when a node is running, but it's inaccessible (e.g. because of a long GC) at the exact time a node is joining. I'd argue that applications that can tolerate multi-second pauses would be better served by {{gcb.transport().initialClusterSize(2)}} and/or an external discovery mechanism (e.g. {{FILE_PING}}, or something based on the WildFly domain controller). For most applications, the current default means just a 15s delay every time the cluster is (re)started.

In particular, because our integration tests use the default configuration, it means a delay of 15s for every test that starts a cluster.



--
This message was sent by Atlassian JIRA
(v6.4.11#64026)


More information about the infinispan-issues mailing list