[jboss-jira] [JBoss JIRA] (JGRP-2199) JDBC_PING cluster doesn't handle shutdown members

Tue Jul 25 09:55:00 EDT 2017

    [ https://issues.jboss.org/browse/JGRP-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439909#comment-13439909 ] 

Bela Ban commented on JGRP-2199:
--------------------------------

OK, installed Postgresql and the JDBC driver. The configuration needed a bit of tweaking, e.g. VARBINARY is not supported:
{code:xml}
<JDBC_PING
     ...

    initialize_sql="CREATE TABLE jgroups_discover (own_addr varchar(200) NOT NULL, cluster_name
        varchar(200) NOT NULL, ping_data bytea DEFAULT NULL, PRIMARY KEY (own_addr, cluster_name) );"
    insert_single_sql="INSERT INTO jgroups_discover (own_addr, cluster_name, ping_data) VALUES (?, ?, ?)"
    delete_single_sql="DELETE FROM jgroups_discover WHERE own_addr = ? AND cluster_name = ?"
    clear_sql="DELETE FROM jgroups_discover WHERE cluster_name = ?"
    select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM jgroups_discover WHERE cluster_name = ?"
   contains_sql="SELECT count(*) AS RECORDCOUNT FROM jgroups_discover WHERE cluster_name = ? AND own_addr = ?"
/>
{code}

> JDBC_PING cluster doesn't handle shutdown members
> -------------------------------------------------
>
>                 Key: JGRP-2199
>                 URL: https://issues.jboss.org/browse/JGRP-2199
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Douglas Adams
>            Assignee: Bela Ban
>             Fix For: 4.0.5
>
>         Attachments: file-ping.xml, jdbc-ping.xml, stuck_starting_up.log
>
>
> FILE_PING and JDBC_PING have different behavior when a cluster's coordinator stops.
> With FILE_PING the coordinator will delete the whole cluster's file on shutdown of the coordinator.
> JDBC_PING does not do this and reveals a problematic flaw in how node's are handled on shutdown.
> When I added my own logging to the source of these files I observed that they're both continuously writing to the database/file all of the members because {{write()}} is called very frequently.
> ---
> Current behavior:
> GIVEN a cluster of JDBC_PING registered nodes
> WHEN a node shuts down 
> THEN it removes itself from the database table AND the coordinator almost immediately re-adds the shut down member to the table because of the {{List<PingData>}} sent to write()
> GIVEN a cluster of JDBC_PING registered nodes has only the coordinator left
> WHEN the coordinator shuts down
> THEN the coordinator removes itself from the database and because there's no coordinator left the database shows a list of only the 'members' with no coordinator
> GIVEN a cluster of JDBC_PING registered nodes
> WHEN the coordinator shuts down or crashes and does not have time to remove itself from the database
> THEN the next node to start will never finish negotiating membership with the cluster because a phantom coordinator still exists (see attachement: stuck_starting_up.log)
> ---
> I expected the behavior between JDBC_PING and FILE_PING to remain consistent


--
This message was sent by Atlassian JIRA
(v7.2.3#72005)