[infinispan-dev] Hung testsuite

Dan Berindei dan.berindei at gmail.com
Tue Oct 2 12:02:34 EDT 2012


On Tue, Oct 2, 2012 at 6:29 PM, Bela Ban <bban at redhat.com> wrote:

>
>
> On 10/2/12 1:43 PM, Dan Berindei wrote:
>
> >>> test threads are stuck in JGroupsTransport.waitForChannelToConnect so
> it
> >>> looks like a JGroups thing.
> >>
> >>
> >> JGroupsTransport.waitForChannelToConnect() is Infinispan code, not
> >> JGroups code. IMO unneeded code, too, as I don't understand why we're
> >> waiting on a latch until we get a view after calling JChannel.connect().
> >> When JChannel.connect() returns, you're guaranteed to have a view
> >> installed, so this code can (should !) be removed, as it makes things
> >> unnecessarily complex (and error prone ?).
> >>
> >
> > True, it's Infinispan code - my code, actually ;)
>
>
> ouch :-)
>
>
> > The reason behind the latch was probably to avoid concurrent updates of
> the
> > members list from the main thread and from the JGroups thread that calls
> viewAccepted().
>
>
> The members should be updated in viewAccepted(). When you connect a
> channel, then viewAccepted() will be called *before* JChannel.connect()
> returns, so the main thread won't have a chance of accessing members
> before that.
>
>
Ok, if viewAccepted() is guaranteed to be called before connect() returns
then you're right, the latch is superfluous. We only need to call
viewAccepted() ourselves if the channel was injected (which we already do).

In this particular case, however, we don't get a viewAccepted() call or an
exception from connect(), so I'm not sure it would have been better (easier
to debug) without the latch.


> > I agree it's probably unnecessary, as we already use synchronization in
> our viewAccepted() implementation, but I don't see why it shouldn't work.
>
> It's (unnecessarily) complex code that could be simplified; I'm not
> suggesting it's incorrect. More complexity == more bugs.
>
>
True. I just wasn't sure whether viewAccepted() is guaranteed to be called
during connect(), so I added a "safety net".

I know there was a NPE in JGroupsTransport that I thought at the time was
caused by a missing viewAccepted() call, but I can't find the stack trace
any more so I'm not sure if that was really the case.


>
> > We have attached the JGroupsTransport as a membership listener to the
> > MessageDispatcher (CommandAwareRpcDispatcher, actually) and we have
> > attached the MessageDispatcher as a channel listener to the channel
> before
> > calling connect(), so JGroups should call viewAccepted() for the initial
> > view just as it does for every view.
>
>
> You mean MembershipListener ? Yes, if you create the channel, then the
> MessageDispatcher, then connect(), then you'll get the viewAccepted()
> callback *before* JChannel.connect() returns.
>
> Unless you deliver the view in a separate thread, but I don't think you
> do this, right ?
>
>
I'm interested here strictly in the viewAccepted() callback in
JGroupsTransport - I don't care how our other components that listen for
our JGroups view events get their notifications, they usually all their
version of viewAccepted() on startup.


> Which version is the stacktrace below with ? 3.0.x ? Taking a look now...
>
>
3.2.0.Alpha2


>
> > I just got the same test to hang on my machine, and I found this
> exception
> > in the log:
> >
> > 12:26:25,098 DEBUG (CacheStarter-Cache4:nbst) [GMS]
> > exception=java.lang.IllegalStateException: channel is not connected,
> > retrying
> > java.lang.IllegalStateException: channel is not connected
> >      at
> >
> org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:621)
> >      at
> >
> org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:535)
> >      at
> >
> org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390)
> >      at
> > org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248)
> >      at
> >
> org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:604)
> >      at org.jgroups.JChannel.up(JChannel.java:715)
> >      at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020)
> >      at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
> >      at org.jgroups.protocols.FC.up(FC.java:479)
> >      at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896)
> >      at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
> >      at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:432)
> >      at
> org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:722)
> >      at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:570)
> >      at
> >
> org.jgroups.protocols.pbcast.NAKACK2.flushBecomeServerQueue(NAKACK2.java:841)
> >      at org.jgroups.protocols.pbcast.NAKACK2.down(NAKACK2.java:490)
> >      at org.jgroups.protocols.UNICAST2.down(UNICAST2.java:523)
> >      at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:307)
> >      at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:637)
> >      at
> >
> org.jgroups.protocols.pbcast.ClientGmsImpl.installView(ClientGmsImpl.java:248)
> >      at
> >
> org.jgroups.protocols.pbcast.ClientGmsImpl.joinInternal(ClientGmsImpl.java:182)
> >      at
> > org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:37)
> >      at org.jgroups.protocols.pbcast.GMS.down(GMS.java:938)
> >      at org.jgroups.protocols.FC.down(FC.java:435)
> >      at org.jgroups.protocols.FRAG2.down(FRAG2.java:147)
> >      at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025)
> >      at org.jgroups.JChannel.down(JChannel.java:729)
> >      at org.jgroups.JChannel.connect(JChannel.java:291)
> >      at org.jgroups.JChannel.connect(JChannel.java:262)
> >      at
> >
> org.infinispan.remoting.transport.jgroups.JGroupsTransport.startJGroupsChannelIfNeeded(JGroupsTransport.java:206)
> >      at
> >
> org.infinispan.remoting.transport.jgroups.JGroupsTransport.start(JGroupsTransport.java:197)
> >      at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source)
> >      at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >      at java.lang.reflect.Method.invoke(Method.java:601)
> >      at
> >
> org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:236)
> >      at
> >
> org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:900)
> >      at
> >
> org.infinispan.factories.AbstractComponentRegistry.invokeStartMethods(AbstractComponentRegistry.java:650)
> >      at
> >
> org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:639)
> >      at
> >
> org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:542)
> >      at
> >
> org.infinispan.factories.GlobalComponentRegistry.start(GlobalComponentRegistry.java:218)
> >      at
> >
> org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:680)
> >      at
> >
> org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:652)
> >      at
> >
> org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:548)
> >      at
> > org.infinispan.statetransfer.JoiningNode.getCache(JoiningNode.java:54)
> >      at
> >
> org.infinispan.statetransfer.StateTransferFunctionalTest$2.run(StateTransferFunctionalTest.java:234)
> >      at java.lang.Thread.run(Thread.java:722)
>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev at lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20121002/37a5386f/attachment.html 


More information about the infinispan-dev mailing list