[jboss-jira] [JBoss JIRA] (JGRP-2335) Code for determining the coordinator hangs in certain conditions

Bela Ban (Jira) issues at jboss.org
Fri Aug 2 09:22:03 EDT 2019


     [ https://issues.jboss.org/browse/JGRP-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-2335:
---------------------------
    Fix Version/s: 4.1.3
                       (was: 4.1.2)


> Code for determining the coordinator hangs in certain conditions
> ----------------------------------------------------------------
>
>                 Key: JGRP-2335
>                 URL: https://issues.jboss.org/browse/JGRP-2335
>             Project: JGroups
>          Issue Type: Bug
>            Reporter: Aieksiei Illarionov
>            Assignee: Bela Ban
>            Priority: Major
>             Fix For: 4.1.3
>
>
> Affected version:
> {code:xml}
>         <dependency>
>             <groupId>org.jgroups</groupId>
>             <artifactId>jgroups</artifactId>
>             <version>4.0.0.Final</version>
>         </dependency>
> {code}
> ClientGmsImpl#joinInternal hangs because #firstOfAllClients always returns false when all of the following conditions are satisfied:
> - using JDBC_PING for discovery protocol
> - JGROUPSPING table contains data from previous sessions
> - all of the previous sessions were killed (kill -9)
> - AddressGenerator is not customized
> The sorted set
> {code:java}
> SortedSet<Address> clients=new TreeSet<>();
> {code}
> contains the dead servers discovered from JGROUPSPING. When the new server is added to the sorted set, it never becomes the first in the sorted set.
> Suggestions: either
> a) somehow involve MembershipChangePolicy in ordering strategy, or
> b) make the new server (joiner) the first in the sorted set, or
> c) make UUID addresses to sort depending on their time of creation.
> I've used the following config:
> {code:xml}
> <!--
>     TCP based stack, with flow control and message bundling. This is usually used when IP
>     multicasting cannot be used in a network, e.g. because it is disabled (routers discard multicast).
>     Note that TCP.bind_addr and TCPPING.initial_hosts should be set, possibly via system properties, e.g.
>     -Djgroups.bind_addr=192.168.5.2 and -Djgroups.tcpping.initial_hosts=192.168.5.2[7800]
>     author: Bela Ban
> -->
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>         xmlns="urn:org:jgroups"
>         xmlns:fork="fork"
>         xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd fork http://www.jgroups.org/schema/fork-stacks.xsd">
>     <TCP bind_port="7800"
>          port_range="10"
>          bind_addr="<placeholder here>"
>          recv_buf_size="${tcp.recv_buf_size:130k}"
>          send_buf_size="${tcp.send_buf_size:130k}"
>          max_bundle_size="64K"
>          sock_conn_timeout="300"
>          thread_pool.min_threads="0"
>          thread_pool.max_threads="20"
>          thread_pool.keep_alive_time="30000"/>
>     <JDBC_PING
>             remove_all_data_on_view_change="true"
>             connection_driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>             connection_url="jdbc:sqlserver://localhost:1433;databaseName=mydatabase"
>             connection_username="user"
>             connection_password="password"
>     />
>     <MERGE3  min_interval="10000"
>              max_interval="30000"/>
>     <FD_SOCK/>
>     <FD timeout="3000" max_tries="3" />
>     <VERIFY_SUSPECT timeout="1500"  />
>     <BARRIER />
>     <pbcast.NAKACK2 use_mcast_xmit="false"
>                     discard_delivered_msgs="true"/>
>     <UNICAST3
>             conn_close_timeout="240000"
>             xmit_interval="5000"/>
>     <pbcast.STABLE desired_avg_gossip="50000"
>                    max_bytes="4M"/>
>     <pbcast.GMS print_local_addr="true" join_timeout="2000"
>                 view_bundling="true"
>                 membership_change_policy="ru.illar.AppMembershipChangePolicy"/>
>     <MFC max_credits="2M"
>          min_threshold="0.4"/>
>     <FRAG2 frag_size="60K"  />
>     <!--RSVP resend_interval="2000" timeout="10000"/-->
>     <pbcast.STATE_TRANSFER/>
> </config>
> {code}



--
This message was sent by Atlassian Jira
(v7.12.1#712002)


More information about the jboss-jira mailing list