[jboss-jira] [JBoss JIRA] Commented: (JGRP-830) viewAccepted is not always reported to all group members
cbowditch (JIRA)
jira-events at lists.jboss.org
Tue Oct 7 06:06:21 EDT 2008
[ https://jira.jboss.org/jira/browse/JGRP-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12432855#action_12432855 ]
cbowditch commented on JGRP-830:
--------------------------------
The reproduction steps are:
1) Start Member 1 (group Test) on machine A
2) Wait a few seconds until ViewAccepted event has been received
3) Start Member 2 (group Test) on machine B
4) Wait a few seconds until ViewAccepted event has been received
5) Start Member 3 (group Test) on machine A
6) Wait a few seconds until ViewAccepted event has been received
7) Start Member 1 (group Test2) on machine A (I'm not sure this step is strictly necessary but it just something I have observed about the company network is that other users are running JGroup related applications using the same mcast and port)
8) Wait a few seconds until all member report warnings about members outside their group.
9) In Network Connections Window on Windows XP Disable Network Connection on machine B
10) Wait 2 full minutes to give FD a chance to work its magic.
11) Re-enable network Connection on machine B
12) Wait 2 full minutes to allow views to merge back together.
What I find is that Member 1 and 3 will usually receive a suspected and new view event after the FD timeout/retries are exhausted. But Member B does not receive a new view but instead a whole heap of errors like:
07-Oct-2008 09:56:44 org.jgroups.protocols.TP down
SEVERE: failed sending message to 192.168.3.45:1422 (43 bytes)
java.lang.Exception: dest=/192.168.3.45:1422 (46 bytes)
at org.jgroups.protocols.UDP._send(UDP.java:345)
at org.jgroups.protocols.UDP.sendToSingleMember(UDP.java:300)
at org.jgroups.protocols.TP.doSend(TP.java:1469)
at org.jgroups.protocols.TP.send(TP.java:1456)
at org.jgroups.protocols.TP.down(TP.java:1177)
at org.jgroups.protocols.Discovery.down(Discovery.java:349)
at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:367)
at org.jgroups.protocols.FD$Monitor.run(FD.java:527)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.BindException: Cannot assign requested address: Datagram send failed
at java.net.PlainDatagramSocketImpl.send(Native Method)
at java.net.DatagramSocket.send(Unknown Source)
at org.jgroups.protocols.UDP._send(UDP.java:341)
... 17 more
These errors are not a problem, but why doesn't member 2 get suspected events for members 1 and 3 and then receive a new View? If I repeat the above steps 3/4 times then eventually members 1 and 3 stop receiving viewAccepted events.
Testing with the viewDemo example I have not been able to recreate the issue to same severity (yet - tests still continuing). What I mean is member 2 doesn't receive a new view whilst network is disabled but after re-enabling network connection on Machine 2, all members receive new view events after about 30 seconds.
So should I change my code to extend ReceiveAdapter instead of using MessageDispatcher? Have you tried running my Test App, the problem is easily re-creatable there?
If the issue is only re-creatable in my application then I'm happy to reduce the severity of the bug and change my app to work as ViewDemo does. The key differences between my App and ViewDemo is the use of getState/setState. So the minor bug maybe that views don't always get propagated when getState/setState is also used?
> viewAccepted is not always reported to all group members
> --------------------------------------------------------
>
> Key: JGRP-830
> URL: https://jira.jboss.org/jira/browse/JGRP-830
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 2.6.4
> Environment: Windows XP
> Reporter: cbowditch
> Assignee: Vladimir Blagojevic
> Fix For: 2.6.5, 2.7
>
> Attachments: JGroups-ViewMergeBug.zip
>
>
> I have written a test harness application that demonstrates the issue. When there are 3 members in the group, 2 on 1 machine and the third on a second machine not all members of the group receive a viewAccepted method call everytime a network outage occurs or after a network outage has been recovered from. I am doing all my testing on Windows XP.
> Receiving the viewAccepted method after a network outage or after recovering from a network outage is vital for all members to be able to keep track of which members are still present in the group.
> I have created a thread on the forum regarding this issue:
> http://sourceforge.net/mailarchive/forum.php?thread_name=BAY117-DAV32E55658B51A7027F6CCDFB400%40phx.gbl&forum_name=javagroups-users
> Although I have only seen the error reported at the start of that thread once so it may be a red herring. It may be necessary to break the network connection between the 2 machines in this test 3/4 times as it sometimes works.
> If you need anymore information on how to recreate this issue then please let me know and I will try to assist.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://jira.jboss.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list