[jboss-jira] [JBoss JIRA] (JGRP-1658) GMS: Node re-joining the cluster during shutdown

Bela Ban (JIRA) jira-events at lists.jboss.org
Tue Jul 16 10:38:26 EDT 2013


    [ https://issues.jboss.org/browse/JGRP-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789998#comment-12789998 ] 

Bela Ban commented on JGRP-1658:
--------------------------------

What are the steps to reproduce this ?

What's triggering the sending of the JOIN-REQ from C to A after the LEAVE-REQ is sent from C to A ? Do you stop and then immediately afterwards start C, for this to happen ?
                
> GMS: Node re-joining the cluster during shutdown
> ------------------------------------------------
>
>                 Key: JGRP-1658
>                 URL: https://issues.jboss.org/browse/JGRP-1658
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.3.1
>            Reporter: Dan Berindei
>            Assignee: Bela Ban
>             Fix For: 3.4
>
>         Attachments: pfapt.log.gz
>
>
> We have RSVP in the stack, with ack_on_delivery=true.
> It seems that node C receives a RSVP-flagged message just after it sent the LEAVE_REQ to A, and immediately after sending the RSVP ACK it sends a JOIN_REQ as well.
> {noformat}
> 11:55:54,524 DEBUG (testng:) [DefaultCacheManager] Stopping cache manager ISPN on C
> 11:55:54,525 DEBUG (testng:) [GMS] C: sending LEAVE request to A
> 11:55:54,525 TRACE (testng:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[LEAVE_REQ]: mbr=C, UNICAST2: DATA, seqno=16, TCP: [channel_name=ISPN]
> 11:55:54,526 TRACE (ViewHandler,A:) [GMS] A: new members=[], suspected=[], leaving=[C], new view: [A|4] [A, B, D]
> 11:55:54,528 TRACE (OOB-3,C:) [TCP] C: received [dst: <null>, src: A (4 headers), size=7469 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=93579, rsp_expected=false, exclusion_list=[A], RSVP: REQ(7), NAKACK2: [MSG, seqno=9], TCP: [channel_name=ISPN]
> 11:55:54,528 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are RSVP: RSP(7), UNICAST2: DATA, seqno=17, TCP: [channel_name=ISPN]
> 11:55:54,529 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[JOIN_REQ]: mbr=C, UNICAST2: DATA, seqno=1, first, TCP: [channel_name=ISPN]
> 11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: new members=[C], suspected=[], leaving=[], new view: [A|5] [A, B, D, C]
> 11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: mcasting view [A|5] [A, B, D, C] (4 mbrs)
> 11:55:54,841 DEBUG (testng:) [TEST_PING] Stop discovery for C
> 11:55:54,841 DEBUG (testng:) [TCP] closing sockets and stopping threads
> 11:55:55,683 TRACE (Timer-5,A:) [TCP] A: sending msg to C, src=A, headers are GMS: GmsHeader[JOIN_RSP]: join_rsp=view: [A|5] [A, B, D, C], digest: B: [0 (0)], D: [0 (0)], A: [11 (11)], C: [0 (0)], UNICAST2: DATA, seqno=1, conn_id=4, first, TCP: [channel_name=ISPN]
> {noformat}
> A adds C back to the view, but C shuts down and will never receive the JOIN_RSP message. Instead, the remaining members keep logging this error message until they are shut down 3 minutes later:
> {noformat}
> 11:59:01,346 TRACE (TransferQueueBundler,D:) [TCP] 127.0.0.1:8003: failed connecting to 127.0.0.1:8002: java.net.ConnectException: Connection refused
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list