[jboss-jira] [JBoss JIRA] (JGRP-1449) SEQUENCER race leads to lost messages

Bela Ban (JIRA) jira-events at lists.jboss.org
Fri Apr 13 10:04:48 EDT 2012


    [ https://issues.jboss.org/browse/JGRP-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683918#comment-12683918 ] 

Bela Ban commented on JGRP-1449:
--------------------------------

I'll have a look. This shouldn't happen with the logic being:
- A sender S sends a message M to the current coord C
- A buffers M in a queue
- When A receives M, it removes M from the queue
- Otherwise, S re-submits M on view changes until S receives M.
--> So even if a new coordinator D discarded M because D received the view change after S received it, S should resubmit M. Hmm, if S got the new view before D, then S will only re-submit M *once*, until the next view change occurs. Perhaps we need to start a timer on a non-empty queue that tries to resubmit until the queue is empty...
Or perhaps the coordinator needs to buffer messages forwarded to it for some time...

I'll investigate this as soon as my todo list gets a bit smaller (I'm currently busy doing perf work for JBossWorld) ...
                
> SEQUENCER race leads to lost messages
> -------------------------------------
>
>                 Key: JGRP-1449
>                 URL: https://issues.jboss.org/browse/JGRP-1449
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 3.0.9
>            Reporter: David Hotham
>            Assignee: Bela Ban
>             Fix For: 3.1
>
>
> I'm seeing an issue where SEQUENCER is dropping messages, with logs like this:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> It looks as though there's some sort of race where:
> -  a member ("CFS-B-pisces-cfs02") is becoming coordinator, but SEQUENCER hasn't yet seen the view change
> -  meanwhile some other member ("CFS-A-pisces-cfs03") has been told of the view change, and so SEQUENCER there forwards messages to the new coordinator
> -  but because the new coordinator doesn't yet know that it is coordinator, we hit the problem above.
> The messages don't ever get retransmitted; so they're simply lost.
> Here's some trace from the member who drops the message, with the line from my application showing that he does indeed become coordinator a few milliseconds later:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> 2012-04-13 10:05:21.393 [Incoming-2,pisces,CFS-B-pisces-cfs02] INFO  c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> And here's trace from the other end, showing that message being broadcast in the new view.
> 2012-04-13 10:05:21.359 [Incoming-1,pisces,CFS-A-pisces-cfs03] INFO  c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> 2012-04-13 10:05:21.361 [ForkJoinPool-1-worker-0] INFO  c.m.c.CommunicatorComponent$Communicator - Broadcasting ClusterMgmtMsg
> I can try to get lower level trace from JGroups if that would help.  
> I'm using the same stack as in JGRP-1443.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


More information about the jboss-jira mailing list