[jboss-jira] [JBoss JIRA] (JGRP-1449) SEQUENCER race leads to lost messages
David Hotham (JIRA)
jira-events at lists.jboss.org
Wed Jun 6 10:08:20 EDT 2012
[ https://issues.jboss.org/browse/JGRP-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699416#comment-12699416 ]
David Hotham commented on JGRP-1449:
------------------------------------
Thanks. I appreciate your time; and I'm glad that all this effort is going to be useful for other people too.
I agree that unit tests would be a good thing for the bugs that we're finding. I don't honestly know that I'm likely to find the time anytime soon - but byteman is new to me and I do like finding new toys to play with; so never say never. Certainly if I hit any further problems I'll give this some serious thought.
Contrariwise... I've been finding all these bugs (not just in SEQUENCER) through a form of testing that I guess isn't part of your regular suite. Per the attachment to JGRP-1451 I'm setting up fully fledged (albeit simplistic) applications and doing nasty things to them in a random way, for many many hours. I hope you'll agree that this has been quite a productive line of attack. Maybe you'd find it worthwhile to do something similar yourself?
> SEQUENCER race leads to lost messages
> -------------------------------------
>
> Key: JGRP-1449
> URL: https://issues.jboss.org/browse/JGRP-1449
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.0.9
> Reporter: David Hotham
> Assignee: Bela Ban
> Fix For: 3.1
>
>
> I'm seeing an issue where SEQUENCER is dropping messages, with logs like this:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> It looks as though there's some sort of race where:
> - a member ("CFS-B-pisces-cfs02") is becoming coordinator, but SEQUENCER hasn't yet seen the view change
> - meanwhile some other member ("CFS-A-pisces-cfs03") has been told of the view change, and so SEQUENCER there forwards messages to the new coordinator
> - but because the new coordinator doesn't yet know that it is coordinator, we hit the problem above.
> The messages don't ever get retransmitted; so they're simply lost.
> Here's some trace from the member who drops the message, with the line from my application showing that he does indeed become coordinator a few milliseconds later:
> 2012-04-13 10:05:21.363 [Incoming-1,pisces,CFS-B-pisces-cfs02] ERROR org.jgroups.protocols.SEQUENCER - CFS-B-pisces-cfs02: non-coord; dropping FORWARD request from CFS-A-pisces-cfs03
> 2012-04-13 10:05:21.393 [Incoming-2,pisces,CFS-B-pisces-cfs02] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> And here's trace from the other end, showing that message being broadcast in the new view.
> 2012-04-13 10:05:21.359 [Incoming-1,pisces,CFS-A-pisces-cfs03] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-B-pisces-cfs02|86] [CFS-B-pisces-cfs02, CFS-B-pisces-cfs03, CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], subgroups=[CFS-A-pisces-cfs03|84] [CFS-A-pisces-cfs03, CFS-A-pisces-cfs02], [CFS-B-pisces-cfs03|85] [CFS-B-pisces-cfs03, CFS-A-pisces-cfs02, CFS-B-pisces-cfs02]
> 2012-04-13 10:05:21.361 [ForkJoinPool-1-worker-0] INFO c.m.c.CommunicatorComponent$Communicator - Broadcasting ClusterMgmtMsg
> I can try to get lower level trace from JGroups if that would help.
> I'm using the same stack as in JGRP-1443.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list