[jboss-jira] [JBoss JIRA] (JGRP-1927) RELAY2: Delays during shutdown

Bela Ban (JIRA) issues at jboss.org
Sun May 24 10:17:19 EDT 2015


     [ https://issues.jboss.org/browse/JGRP-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bela Ban updated JGRP-1927:
---------------------------
    Fix Version/s: 3.6.4


> RELAY2: Delays during shutdown
> ------------------------------
>
>                 Key: JGRP-1927
>                 URL: https://issues.jboss.org/browse/JGRP-1927
>             Project: JGroups
>          Issue Type: Enhancement
>    Affects Versions: 3.6.2
>            Reporter: Dan Berindei
>            Assignee: Bela Ban
>             Fix For: 3.6.4
>
>
> Say we have 2 clusters connected via RELAY2, AB and CD. A is the site master of AB, C is the site master of CD, and A is the coordinator of the bridge cluster.
> When node A is stopped, it first leaves the AB cluster, and then the bridge cluster. When B receives the new AB view, it tries to join the bridge cluster. But because RELAY2 hasn't yet finished stopping the relay channel on A, B still sees A as the coordinator of the bridge channel, and sends the JOIN_REQ message to it.
> The bridge channel stops on A before receiving B's join response, so B won't receive any response and will eventually time out. If the user tries to stop node B, it will block waiting for the JChannel lock:
> {noformat}
> "testng at 1" prio=5 tid=0x1 nid=NA waiting for monitor entry
>   java.lang.Thread.State: BLOCKED
> 	 waiting for Incoming-1,NodeB at 2894 to release lock on <0xd26> (a org.jgroups.JChannel)
> 	  at org.jgroups.JChannel.close(JChannel.java:385)
> 	  at org.jgroups.util.Util.close(Util.java:408)
> 	  at org.jgroups.protocols.relay.Relayer$Bridge.stop(Relayer.java:256)
> 	  at org.jgroups.protocols.relay.Relayer.stop(Relayer.java:109)
> 	  at org.jgroups.protocols.relay.RELAY2.stop(RELAY2.java:280)
> 	  at org.jgroups.stack.ProtocolStack.stopStack(ProtocolStack.java:1015)
> 	  at org.jgroups.JChannel.stopStack(JChannel.java:1003)
> 	  at org.jgroups.JChannel.disconnect(JChannel.java:373)
> 	  - locked <0xd54> (a org.jgroups.JChannel)
> 	  at org.infinispan.remoting.transport.jgroups.JGroupsTransport.stop(JGroupsTransport.java:262)
> 	  at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
> 	  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	  at java.lang.reflect.Method.invoke(Method.java:497)
> "Incoming-1,B at 2894" prio=5 tid=0x2f nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
> 	 blocks testng at 1
> 	  at org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:114)
> 	  at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:44)
> 	  at org.jgroups.protocols.pbcast.GMS.down(GMS.java:1084)
> 	  at org.jgroups.protocols.FlowControl.down(FlowControl.java:353)
> 	  at org.jgroups.protocols.FRAG2.down(FRAG2.java:136)
> 	  at org.jgroups.protocols.RSVP.down(RSVP.java:153)
> 	  at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1038)
> 	  at org.jgroups.JChannel.down(JChannel.java:791)
> 	  at org.jgroups.JChannel._connect(JChannel.java:564)
> 	  at org.jgroups.JChannel.connect(JChannel.java:294)
> 	  - locked <0xd26> (a org.jgroups.JChannel)
> 	  at org.jgroups.JChannel.connect(JChannel.java:279)
> 	  at org.jgroups.protocols.relay.Relayer$Bridge.start(Relayer.java:250)
> 	  at org.jgroups.protocols.relay.Relayer.start(Relayer.java:86)
> 	  at org.jgroups.protocols.relay.RELAY2.startRelayer(RELAY2.java:681)
> 	  at org.jgroups.protocols.relay.RELAY2.handleView(RELAY2.java:663)
> 	  at org.jgroups.protocols.relay.RELAY2.up(RELAY2.java:416)
> 	  at org.jgroups.protocols.RSVP.up(RSVP.java:201)
> 	  at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
> 	  at org.jgroups.protocols.FlowControl.up(FlowControl.java:394)
> 	  at org.jgroups.protocols.tom.TOA.up(TOA.java:121)
> 	  at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:732)
> 	  - locked <0xd2e> (a org.jgroups.protocols.pbcast.GMS)
> 	  at org.jgroups.protocols.pbcast.ParticipantGmsImpl.handleViewChange(ParticipantGmsImpl.java:146)
> {noformat}
>  
> Ideally, I'd like B to realize that the coordinator of the bridge cluster has the same site id, and delay sending the join request until C becomes the coordinator. Failing that, maybe {{JChannel.stop}} could interrupt the thread doing the join?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


More information about the jboss-jira mailing list