[JBoss JIRA] (WFLY-8617) Distributed workmanager fails to obtain view

Friday, 21 April 2017

     [
https://issues.jboss.org/browse/WFLY-8617?page=com.atlassian.jira.plugin....
]

Kabir Khan resolved WFLY-8617.
------------------------------
    Fix Version/s: 11.0.0.Beta1
       Resolution: Done

...
 Distributed workmanager fails to obtain view
 --------------------------------------------

                 Key: WFLY-8617
                 URL: https://issues.jboss.org/browse/WFLY-8617
             Project: WildFly
          Issue Type: Bug
          Components: JCA
            Reporter: Stefano Maestri
            Assignee: Stefano Maestri
            Priority: Critical
              Labels: KK-DR17
             Fix For: 11.0.0.Beta1

 When starting two EAP instances with a distributed workmanager configured, the following
exception is logged on the first instance ~6 seconds after the second instance starts:
 {code}
 16:11:24,204 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: JBoss EAP
7.1.0.Alpha1 (WildFly Core 3.0.0.Beta6-redhat-1) started in 5905ms - Started 467 of 700
services (472 services are lazy, passive or on-demand)
 16:11:42,066 ERROR
[org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport] (thread-2)
ViewAccepted: org.jgroups.TimeoutException: timeout waiting for response from node2,
request: UnicastRequest, mode=GET_ALL, target=node2:
javax.resource.spi.work.WorkException: org.jgroups.TimeoutException: timeout waiting for
response from node2, request: UnicastRequest, mode=GET_ALL, target=node2
 	at
org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.sendMessage(JGroupsTransport.java:589)
 	at
org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.viewAccepted(JGroupsTransport.java:943)
 	at org.jgroups.blocks.MessageDispatcher.handleUpEvent(MessageDispatcher.java:618)
 	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:666)
 	at org.jgroups.JChannel.up(JChannel.java:738)
 	at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:124)
 	at org.jgroups.stack.Protocol.up(Protocol.java:380)
 	at org.jgroups.protocols.FORK.up(FORK.java:118)
 	at org.jgroups.protocols.FRAG2.up(FRAG2.java:165)
 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
 	at org.jgroups.protocols.FlowControl.up(FlowControl.java:390)
 	at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:727)
 	at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:225)
 	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:917)
 	at org.jgroups.stack.Protocol.up(Protocol.java:418)
 	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:294)
 	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:487)
 	at org.jgroups.protocols.pbcast.NAKACK2.deliverBatch(NAKACK2.java:989)
 	at org.jgroups.protocols.pbcast.NAKACK2.removeAndPassUp(NAKACK2.java:919)
 	at org.jgroups.protocols.pbcast.NAKACK2.handleMessage(NAKACK2.java:851)
 	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:611)
 	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:155)
 	at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:200)
 	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:325)
 	at org.jgroups.protocols.MERGE3.up(MERGE3.java:292)
 	at org.jgroups.protocols.Discovery.up(Discovery.java:296)
 	at org.jgroups.protocols.TP.passMessageUp(TP.java:1657)
 	at org.jgroups.protocols.TP$3.run(TP.java:1591)
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 	at
org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52)
 	at java.lang.Thread.run(Thread.java:745)
 Caused by: org.jgroups.TimeoutException: timeout waiting for response from node2,
request: UnicastRequest, mode=GET_ALL, target=node2
 	at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:442)
 	at org.jgroups.blocks.RpcDispatcher.callRemoteMethod(RpcDispatcher.java:241)
 	at
org.jboss.jca.core.workmanager.transport.remote.jgroups.JGroupsTransport.sendMessage(JGroupsTransport.java:579)
 	... 31 more
 {code}
 Judging by the stacktrace, it looks like a view of both of the wms is never created and
the two workmanagers never manage to communicate with each other. That would also explain
why it looks like work is never done on a different node than where it's scheduled
(even though there are proper selector and policy settings). I'll file another issue
for that where I'll add a reproducing test.
 Since it's a timeout exception, I've been trying to find out if the error is on
my end (network issues), but it doesn't look like that - common clustering sessions,
which use the JGroups stack too, are replicated properly. 

--
This message was sent by Atlassian JIRA
(v7.2.3#72005)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006