[jboss-jira] [JBoss JIRA] Commented: (JGRP-1303) Multiple Gossip Routers not working with TUNNEL

Bela Ban (JIRA) jira-events at lists.jboss.org
Tue Mar 22 10:37:46 EDT 2011


    [ https://issues.jboss.org/browse/JGRP-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589670#comment-12589670 ] 

Bela Ban commented on JGRP-1303:
--------------------------------

OK, so I tested this and it worked 5 out of 5 times. Here's what I did to reproduce:
- Use the attached jgroups-2.12.1.beta1 JAR (should also work on 2.12.0, as there weren't any changes wrt GR)
- Use the attached tunnel.xml, change the IP addresses

- Start GR1: JGroups/bin/gossiprouter.sh -port 4000
- Do *not* start GR2 !
- Start A: draw -props ./tunnel.xml -name A
- Start B: draw -props ./tunnel.xml -name B
- A and B form a cluster

#1 Kill GR1
#2 A and B become singleton members
#3 Start GR1 again
#4 A and B form a cluster

Do steps #1 - #4 multiple times, it always worked !

> Multiple Gossip Routers not working with TUNNEL
> -----------------------------------------------
>
>                 Key: JGRP-1303
>                 URL: https://issues.jboss.org/browse/JGRP-1303
>             Project: JGroups
>          Issue Type: Bug
>    Affects Versions: 2.10
>         Environment: Linux, Windows
>            Reporter: vivek v
>            Assignee: Vladimir Blagojevic
>             Fix For: 2.12.1
>
>         Attachments: jgroups-2.12.1.Beta1.jar, tunnel.xml
>
>
> We are using Tunnel protocol in JGroups 2.10 GA. I noticed sometimes nodes don't rejoin after GR has gone down and come back. Here is our
> scenario,
> 1) Two nodes: A, B
> 2) Two Gossip Routers: GR1 (port 4575), GR2  (4574) - only GR1 is up
> 3) Initially A and B are taking
> 4) Bring down GR1
> 5) Both A and B become singleton nodes
> 6) Bring back up GR1
> 7) Node A keep getting GR2 in its stubs list,
> {code}
> 2011-03-07 16:30:00,263 WARN  [Timer-2,vivek,manager_172.16.4.29:3010]
> TUNNEL - failed sending a message to all members, GR used
> lt-vivek01.us.packetmotion.com/172.16.4.29:4574
> 2011-03-07 16:30:01,103 WARN  [Timer-2,vivek,manager_172.16.4.29:3010]
> TUNNEL - failed sending a message to all members, GR used
> lt-vivek01.us.packetmotion.com/172.16.4.29:4574
> 2011-03-07 16:30:01,103 ERROR [Timer-2,vivek,manager_172.16.4.29:3010]
> TUNNEL - failed sending message to null (99 bytes):
> java.lang.Exception: None of the available stubs
> [RouterStub[localsocket=0.0.0.0/0.0.0.0:55732,router_host=lt-vivek01.us.packetmotion.com::4574,connected=false],
> RouterStub[localsocket=0.0.0.0/0.0.0.0:55732,router_host=lt-vivek01.us.packetmotion.com::4574,connected=false]]
> accepted a multicast message
> {code}
> Note, above Node A has GR2 twice in its stubs list. This causes node A to continue try sending message from the down GR and thus, we never
> get a new view with both nodes in it. I tried this test 5 times and  out of that 3 failed and 2 passed (got new view after GR was up). In
> failed cases node A and B remained singleton even after the GR1 was up.
> I'm not sure where this is happening, the only place in Tunnel where we register GR is in "handleDownEvent(..)"
> {code}
>  for (InetSocketAddress gr : gossip_router_hosts) {
>           RouterStub stub =
> stubManager.createAndRegisterStub(gr.getHostName(), gr.getPort(),
> bind_addr);
>            stub.setTcpNoDelay(tcp_nodelay);
>  }
> {code}
> I looked at RouterSubManager code, but don't see how we would get two GR stubs for the same address. Looks like the RouterStub object itself might be getting changed at run-time - I don't see where, but that seems to be most obvious conclusion from this behavior.
> Looks like this will cause fail over not to work in a clustered GR setup. I would think if one GR goes down all the communication should start flowing through the second one, but for some reason this is not happening in certain scenarios. This becomes even more critical when using Tunnel protocol
> as now all traffic need to pass through GRs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the jboss-jira mailing list