[jboss-jira] [JBoss JIRA] (JGRP-1658) GMS: Node re-joining the cluster during shutdown
Bela Ban (JIRA)
jira-events at lists.jboss.org
Thu Jul 18 05:47:26 EDT 2013
[ https://issues.jboss.org/browse/JGRP-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790640#comment-12790640 ]
Bela Ban commented on JGRP-1658:
--------------------------------
OK, so this would not have occurred with UNICAST3:
- C leaves
- Everybody installs the new view [A,B,D] excluding C
- At the same time, C sends a spurious message to A
- A did not remove C's connection entry, causing a SEND_FIRST_SEQNO to be sent to C, but instead marked C's connection entry as CLOSING
- When A got the unicast message from C, it simply changed the state of C's connection entry from CLOSING to OPEN and delivered the message
If the unit test confirms this, then we could backport the connection entry handling from UNICAST3 to UNICAST2.
> GMS: Node re-joining the cluster during shutdown
> ------------------------------------------------
>
> Key: JGRP-1658
> URL: https://issues.jboss.org/browse/JGRP-1658
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.3.1
> Reporter: Dan Berindei
> Assignee: Bela Ban
> Fix For: 3.4
>
> Attachments: pfapt.log.gz
>
>
> We have RSVP in the stack, with ack_on_delivery=true.
> It seems that node C receives a RSVP-flagged message just after it sent the LEAVE_REQ to A, and immediately after sending the RSVP ACK it sends a JOIN_REQ as well.
> {noformat}
> 11:55:54,524 DEBUG (testng:) [DefaultCacheManager] Stopping cache manager ISPN on C
> 11:55:54,525 DEBUG (testng:) [GMS] C: sending LEAVE request to A
> 11:55:54,525 TRACE (testng:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[LEAVE_REQ]: mbr=C, UNICAST2: DATA, seqno=16, TCP: [channel_name=ISPN]
> 11:55:54,526 TRACE (ViewHandler,A:) [GMS] A: new members=[], suspected=[], leaving=[C], new view: [A|4] [A, B, D]
> 11:55:54,528 TRACE (OOB-3,C:) [TCP] C: received [dst: <null>, src: A (4 headers), size=7469 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER|RSVP], headers are RequestCorrelator: id=200, type=REQ, id=93579, rsp_expected=false, exclusion_list=[A], RSVP: REQ(7), NAKACK2: [MSG, seqno=9], TCP: [channel_name=ISPN]
> 11:55:54,528 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are RSVP: RSP(7), UNICAST2: DATA, seqno=17, TCP: [channel_name=ISPN]
> 11:55:54,529 TRACE (OOB-3,C:) [TCP] C: sending msg to A, src=C, headers are GMS: GmsHeader[JOIN_REQ]: mbr=C, UNICAST2: DATA, seqno=1, first, TCP: [channel_name=ISPN]
> 11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: new members=[C], suspected=[], leaving=[], new view: [A|5] [A, B, D, C]
> 11:55:54,613 TRACE (ViewHandler,A:) [GMS] A: mcasting view [A|5] [A, B, D, C] (4 mbrs)
> 11:55:54,841 DEBUG (testng:) [TEST_PING] Stop discovery for C
> 11:55:54,841 DEBUG (testng:) [TCP] closing sockets and stopping threads
> 11:55:55,683 TRACE (Timer-5,A:) [TCP] A: sending msg to C, src=A, headers are GMS: GmsHeader[JOIN_RSP]: join_rsp=view: [A|5] [A, B, D, C], digest: B: [0 (0)], D: [0 (0)], A: [11 (11)], C: [0 (0)], UNICAST2: DATA, seqno=1, conn_id=4, first, TCP: [channel_name=ISPN]
> {noformat}
> A adds C back to the view, but C shuts down and will never receive the JOIN_RSP message. Instead, the remaining members keep logging this error message until they are shut down 3 minutes later:
> {noformat}
> 11:59:01,346 TRACE (TransferQueueBundler,D:) [TCP] 127.0.0.1:8003: failed connecting to 127.0.0.1:8002: java.net.ConnectException: Connection refused
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list