[jboss-jira] [JBoss JIRA] (JGRP-1449) SEQUENCER race leads to lost messages
David Hotham (JIRA)
jira-events at lists.jboss.org
Tue Jun 5 12:16:18 EDT 2012
[ https://issues.jboss.org/browse/JGRP-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699027#comment-12699027 ]
David Hotham commented on JGRP-1449:
------------------------------------
I'm also concerned by the change you've made to canDeliver() (per your comments in JGRP-1461, I think).
Suppose:
- application sends messages 3 and 4 (from the same thread; so from its point of view, strictly in that order)
- scenario per my last comment, B drops message 3
- then B gets the new view before message 4 arrives
- B accepts and re-broadcasts message 4
- now everyone receives and delivers message 4 even though they haven't yet received message 3
- (message 3 will be re-broadcast in due course, once we get this issue fixed up. But it can only ever arrive after message 4)
So SEQUENCER now guarantees that everyone receives messages in the same order, but not that this is the order that they were sent by the application!
What do you think?
> SEQUENCER race leads to lost messages
> -------------------------------------
>
> Key: JGRP-1449
> URL: https://issues.jboss.org/browse/JGRP-1449
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.0.9
> Reporter: David Hotham
> Assignee: Bela Ban
> Fix For: 3.1
>
>
> I'm seeing an issue where SEQUENCER is dropping messages, with logs like this:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> It looks as though there's some sort of race where:
> - a member ("CFS-B-pisces-cfs02") is becoming coordinator, but SEQUENCER hasn't yet seen the view change
> - meanwhile some other member ("CFS-A-pisces-cfs03") has been told of the view change, and so SEQUENCER there forwards messages to the new coordinator
> - but because the new coordinator doesn't yet know that it is coordinator, we hit the problem above.
> The messages don't ever get retransmitted; so they're simply lost.
> Here's some trace from the member who drops the message, with the line from my application showing that he does indeed become coordinator a few milliseconds later:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> 2012-04-13 10:05:21.393 [Incoming-2,pisces,CFS-B-pisces-cfs02] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> And here's trace from the other end, showing that message being broadcast in the new view.
> 2012-04-13 10:05:21.359 [Incoming-1,pisces,CFS-A-pisces-cfs03] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> 2012-04-13 10:05:21.361 [ForkJoinPool-1-worker-0] INFO c.m.c.CommunicatorComponent$Communicator - Broadcasting ClusterMgmtMsg
> I can try to get lower level trace from JGroups if that would help.
> I'm using the same stack as in JGRP-1443.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list