March 2018 - jboss-jira - Jboss List Archives

[JBoss JIRA] (DROOLS-2429) [DMN Editor] Decision Navigator dock (Mini Map)

by Guilherme Carreiro (JIRA)

Guilherme Carreiro created DROOLS-2429: ------------------------------------------ Summary: [DMN Editor] Decision Navigator dock (Mini Map) Key: DROOLS-2429 URL: https://issues.jboss.org/browse/DROOLS-2429 Project: Drools Issue Type: Task Reporter: Guilherme Carreiro Assignee: Edson Tirelli Priority: Minor -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (WFCORE-3713) HttpClientURLConnection breaks the TCK

by Stuart Douglas (JIRA)

[ https://issues.jboss.org/browse/WFCORE-3713?page=com.atlassian.jira.plugi... ] Stuart Douglas commented on WFCORE-3713: ---------------------------------------- I am going to revert this for now, as it did not go through the feature review process. > HttpClientURLConnection breaks the TCK > -------------------------------------- > > Key: WFCORE-3713 > URL: https://issues.jboss.org/browse/WFCORE-3713 > Project: WildFly Core > Issue Type: Bug > Reporter: Stuart Douglas > Assignee: Jan Kalina > Priority: Blocker > > As this does not match the behaviour of the JDK HttpUrlConnection some tests in TCK7 no longer work. The particular behaviour in this case is that GET requests are not automatically changed to POST requests if doInput is called, however looking at the class it looks like there is a massive amount of behaviour that is not consistent. > This is a big problem, as any customer that is relying on the behaviour of the JDK client have to modify their application. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (SWSQE-93) Create internal git repository

by Matt Mahoney (JIRA)

[ https://issues.jboss.org/browse/SWSQE-93?page=com.atlassian.jira.plugin.s... ] Matt Mahoney commented on SWSQE-93: ----------------------------------- FYI: [Another option for internal Repo|https://redhat.service-now.com/nav_to.do?uri=%2Fx_redha_pnt_devops_t...] _ This request now can be done thru PNT DevOps One site at [1]. If you follow: Create ticket->Categories/Gerrit->Create New Gerrit Project, link should be [2]. [1] https://redhat.service-now.com/pnt [2] https://redhat.service-now.com/pnt/?id=sc_cat_item&sys_id=8a97fe611335660... _ > Create internal git repository > ------------------------------ > > Key: SWSQE-93 > URL: https://issues.jboss.org/browse/SWSQE-93 > Project: Swift Sunshine QE > Issue Type: Sub-task > Reporter: Filip Brychta > Assignee: Filip Brychta > > We need internal git repository for things like: > * dockerfiles for jenkins docker slaves > * openshift templates > * installation scripts > Part of this task is to investigate options we have for internal git repositories (there is not just one) -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (SWSQE-84) Remove openstack cluster and reinstall vanila RHEL 7.4 there

by Matt Mahoney (JIRA)

[ https://issues.jboss.org/browse/SWSQE-84?page=com.atlassian.jira.plugin.s... ] Matt Mahoney commented on SWSQE-84: ----------------------------------- FYI: Viet gave the green light to re-purpose the Openstack blades. > Remove openstack cluster and reinstall vanila RHEL 7.4 there > ------------------------------------------------------------ > > Key: SWSQE-84 > URL: https://issues.jboss.org/browse/SWSQE-84 > Project: Swift Sunshine QE > Issue Type: Task > Reporter: Filip Brychta > Assignee: Filip Brychta > > According to https://mojo.redhat.com/docs/DOC-1013097 there are 4 blades reserved for openstack cluster. During infra call we agreed that those will be reinstalled and used for openshift cluster. This task is to remove openstack and install RHEL 7.4 there -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JGRP-2261) NPE in FD_ALL2

by Rich DiCroce (JIRA)

Rich DiCroce created JGRP-2261: ---------------------------------- Summary: NPE in FD_ALL2 Key: JGRP-2261 URL: https://issues.jboss.org/browse/JGRP-2261 Project: JGroups Issue Type: Bug Affects Versions: 4.0.10 Environment: WildFly 12.0.0.Final Reporter: Rich DiCroce Assignee: Bela Ban I'm seeing a NPE in FD_ALL2 from time to time. Not consistent but the reason isn't hard to see. Stack trace: {code} 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) Exception in thread "thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)" java.lang.NullPointerException 16:08:06,244 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.lambda$new$0(FD_ALL2.java:83) 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.MessageBatch.replaceIf(MessageBatch.java:220) 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.FD_ALL2.up(FD_ALL2.java:186) 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372) 16:08:06,245 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.stack.Protocol.up(Protocol.java:372) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.protocols.TP.passBatchUp(TP.java:1274) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.passBatchUp(SubmitToThreadPool.java:140) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:136) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 16:08:06,246 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:52) 16:08:06,247 ERROR [stderr] (thread-10,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) at java.lang.Thread.run(Thread.java:748) {code} HAS_HEADER is assuming msg is non-null, but MessageBatch makes it clear that it's valid for elements of the batch to be null, and replaceIf() doesn't perform a null check. -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JGRP-2260) UNICAST3 doesn't remove dead nodes from its tables

by Rich DiCroce (JIRA)

[ https://issues.jboss.org/browse/JGRP-2260?page=com.atlassian.jira.plugin.... ] Rich DiCroce commented on JGRP-2260: ------------------------------------ According to the source, max_retransmit_time defaults to 1 minute. It should be set to the default value, since my configuration doesn't set it. So why then is it still trying to retransmit to a dead node 7 minutes later? The log spam goes on forever, so it's not stopping attempts to retransmit after 1 minute anyway. > UNICAST3 doesn't remove dead nodes from its tables > -------------------------------------------------- > > Key: JGRP-2260 > URL: https://issues.jboss.org/browse/JGRP-2260 > Project: JGroups > Issue Type: Bug > Affects Versions: 4.0.10 > Environment: WildFly 12.0.0.Final > Reporter: Rich DiCroce > Assignee: Bela Ban > > Scenario: 2 WildFly instances clustered together. A ForkChannel is defined, with a MessageDispatcher on top. I start both nodes, then stop the second one. 6-7 minutes after stopping the second node, I start getting log spam on the first node: > {quote} > 12:47:04,519 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:06,522 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:08,524 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > {quote} > After some debugging, I discovered that the reason is because UNICAST3 is still trying to retransmit to the dead node. Its send_table still contains an entry for the dead node with state OPEN. > After looking at the source code for UNICAST3, I have a theory about what's happening. > * When a node leaves the cluster, down(Event) gets invoked with a view change, which calls closeConnection(Address) for each node that left. That sets the connection state to CLOSING. > * Suppose that immediately after the view change is handled, a message with the dead node as its destination gets passed to down(Message). That invokes getSenderEntry(Address), which finds the connection... and sets the state back to OPEN. > Consequently, the connection is never closed or removed from the table, so retransmit attempts continue forever even though they will never succeed. > This issue is easily reproducible for me, although unfortunately I can't give you the application in question. But if you have fixes you want to try, I'm happy to drop in a patched JAR and see if the issue still happens. > This is my JGroups subsystem configuration: > {code:xml} > <subsystem xmlns="urn:jboss:domain:jgroups:6.0"> > <channels default="ee"> > <channel name="ee" stack="main"> > <fork name="shared-dispatcher"/> > <fork name="group-topology"/> > </channel> > </channels> > <stacks> > <stack name="main"> > <transport type="UDP" socket-binding="jgroups" site="${gp.site:DEFAULT}"/> > <protocol type="PING"/> > <protocol type="MERGE3"> > <property name="min_interval"> > 1000 > </property> > <property name="max_interval"> > 5000 > </property> > </protocol> > <protocol type="FD_SOCK"/> > <protocol type="FD_ALL2"> > <property name="interval"> > 3000 > </property> > <property name="timeout"> > 8000 > </property> > </protocol> > <protocol type="VERIFY_SUSPECT"/> > <protocol type="pbcast.NAKACK2"/> > <protocol type="UNICAST3"/> > <protocol type="pbcast.STABLE"/> > <protocol type="pbcast.GMS"> > <property name="join_timeout"> > 100 > </property> > </protocol> > <protocol type="UFC"/> > <protocol type="MFC"/> > <protocol type="FRAG3"/> > </stack> > </stacks> > </subsystem> > {code} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JGRP-2260) UNICAST3 doesn't remove dead nodes from its tables

by Bela Ban (JIRA)

[ https://issues.jboss.org/browse/JGRP-2260?page=com.atlassian.jira.plugin.... ] Bela Ban edited comment on JGRP-2260 at 3/27/18 2:16 PM: --------------------------------------------------------- You can restrict the time of retransmission to a dead node by setting {{max_retransmit_time}}. This is not a bug, but a design choice; being able to send messages to a non-member (e.g. when joining). was (Author: belaban): You can restrict the time of retransmission to a dead node by setting {{max_retransmit_time}}. This is not a bug, but a design choice. > UNICAST3 doesn't remove dead nodes from its tables > -------------------------------------------------- > > Key: JGRP-2260 > URL: https://issues.jboss.org/browse/JGRP-2260 > Project: JGroups > Issue Type: Bug > Affects Versions: 4.0.10 > Environment: WildFly 12.0.0.Final > Reporter: Rich DiCroce > Assignee: Bela Ban > > Scenario: 2 WildFly instances clustered together. A ForkChannel is defined, with a MessageDispatcher on top. I start both nodes, then stop the second one. 6-7 minutes after stopping the second node, I start getting log spam on the first node: > {quote} > 12:47:04,519 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:06,522 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:08,524 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > {quote} > After some debugging, I discovered that the reason is because UNICAST3 is still trying to retransmit to the dead node. Its send_table still contains an entry for the dead node with state OPEN. > After looking at the source code for UNICAST3, I have a theory about what's happening. > * When a node leaves the cluster, down(Event) gets invoked with a view change, which calls closeConnection(Address) for each node that left. That sets the connection state to CLOSING. > * Suppose that immediately after the view change is handled, a message with the dead node as its destination gets passed to down(Message). That invokes getSenderEntry(Address), which finds the connection... and sets the state back to OPEN. > Consequently, the connection is never closed or removed from the table, so retransmit attempts continue forever even though they will never succeed. > This issue is easily reproducible for me, although unfortunately I can't give you the application in question. But if you have fixes you want to try, I'm happy to drop in a patched JAR and see if the issue still happens. > This is my JGroups subsystem configuration: > {code:xml} > <subsystem xmlns="urn:jboss:domain:jgroups:6.0"> > <channels default="ee"> > <channel name="ee" stack="main"> > <fork name="shared-dispatcher"/> > <fork name="group-topology"/> > </channel> > </channels> > <stacks> > <stack name="main"> > <transport type="UDP" socket-binding="jgroups" site="${gp.site:DEFAULT}"/> > <protocol type="PING"/> > <protocol type="MERGE3"> > <property name="min_interval"> > 1000 > </property> > <property name="max_interval"> > 5000 > </property> > </protocol> > <protocol type="FD_SOCK"/> > <protocol type="FD_ALL2"> > <property name="interval"> > 3000 > </property> > <property name="timeout"> > 8000 > </property> > </protocol> > <protocol type="VERIFY_SUSPECT"/> > <protocol type="pbcast.NAKACK2"/> > <protocol type="UNICAST3"/> > <protocol type="pbcast.STABLE"/> > <protocol type="pbcast.GMS"> > <property name="join_timeout"> > 100 > </property> > </protocol> > <protocol type="UFC"/> > <protocol type="MFC"/> > <protocol type="FRAG3"/> > </stack> > </stacks> > </subsystem> > {code} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JGRP-2260) UNICAST3 doesn't remove dead nodes from its tables

by Bela Ban (JIRA)

[ https://issues.jboss.org/browse/JGRP-2260?page=com.atlassian.jira.plugin.... ] Bela Ban commented on JGRP-2260: -------------------------------- You can restrict the time of retransmission to a dead node by setting {{max_retransmit_time}}. This is not a bug, but a design choice. > UNICAST3 doesn't remove dead nodes from its tables > -------------------------------------------------- > > Key: JGRP-2260 > URL: https://issues.jboss.org/browse/JGRP-2260 > Project: JGroups > Issue Type: Bug > Affects Versions: 4.0.10 > Environment: WildFly 12.0.0.Final > Reporter: Rich DiCroce > Assignee: Bela Ban > > Scenario: 2 WildFly instances clustered together. A ForkChannel is defined, with a MessageDispatcher on top. I start both nodes, then stop the second one. 6-7 minutes after stopping the second node, I start getting log spam on the first node: > {quote} > 12:47:04,519 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:06,522 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > 12:47:08,524 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message > {quote} > After some debugging, I discovered that the reason is because UNICAST3 is still trying to retransmit to the dead node. Its send_table still contains an entry for the dead node with state OPEN. > After looking at the source code for UNICAST3, I have a theory about what's happening. > * When a node leaves the cluster, down(Event) gets invoked with a view change, which calls closeConnection(Address) for each node that left. That sets the connection state to CLOSING. > * Suppose that immediately after the view change is handled, a message with the dead node as its destination gets passed to down(Message). That invokes getSenderEntry(Address), which finds the connection... and sets the state back to OPEN. > Consequently, the connection is never closed or removed from the table, so retransmit attempts continue forever even though they will never succeed. > This issue is easily reproducible for me, although unfortunately I can't give you the application in question. But if you have fixes you want to try, I'm happy to drop in a patched JAR and see if the issue still happens. > This is my JGroups subsystem configuration: > {code:xml} > <subsystem xmlns="urn:jboss:domain:jgroups:6.0"> > <channels default="ee"> > <channel name="ee" stack="main"> > <fork name="shared-dispatcher"/> > <fork name="group-topology"/> > </channel> > </channels> > <stacks> > <stack name="main"> > <transport type="UDP" socket-binding="jgroups" site="${gp.site:DEFAULT}"/> > <protocol type="PING"/> > <protocol type="MERGE3"> > <property name="min_interval"> > 1000 > </property> > <property name="max_interval"> > 5000 > </property> > </protocol> > <protocol type="FD_SOCK"/> > <protocol type="FD_ALL2"> > <property name="interval"> > 3000 > </property> > <property name="timeout"> > 8000 > </property> > </protocol> > <protocol type="VERIFY_SUSPECT"/> > <protocol type="pbcast.NAKACK2"/> > <protocol type="UNICAST3"/> > <protocol type="pbcast.STABLE"/> > <protocol type="pbcast.GMS"> > <property name="join_timeout"> > 100 > </property> > </protocol> > <protocol type="UFC"/> > <protocol type="MFC"/> > <protocol type="FRAG3"/> > </stack> > </stacks> > </subsystem> > {code} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (SWSQE-99) Why are tests running against zalenim in VM faster then zalenium in OS

by Filip Brychta (JIRA)

Filip Brychta created SWSQE-99: ---------------------------------- Summary: Why are tests running against zalenim in VM faster then zalenium in OS Key: SWSQE-99 URL: https://issues.jboss.org/browse/SWSQE-99 Project: Swift Sunshine QE Issue Type: Bug Reporter: Filip Brychta Assignee: Filip Brychta Tests running against http://zalenium.bc.jonqe.lab.eng.bos.redhat.com:4444 are faster then http://zalenium-fbr-zal-test.openshift.jonqe.lab.eng.bos.redhat.com/ Why? -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

[JBoss JIRA] (JGRP-2260) UNICAST3 doesn't remove dead nodes from its tables

by Rich DiCroce (JIRA)

Rich DiCroce created JGRP-2260: ---------------------------------- Summary: UNICAST3 doesn't remove dead nodes from its tables Key: JGRP-2260 URL: https://issues.jboss.org/browse/JGRP-2260 Project: JGroups Issue Type: Bug Affects Versions: 4.0.10 Environment: WildFly 12.0.0.Final Reporter: Rich DiCroce Assignee: Bela Ban Scenario: 2 WildFly instances clustered together. A ForkChannel is defined, with a MessageDispatcher on top. I start both nodes, then stop the second one. 6-7 minutes after stopping the second node, I start getting log spam on the first node: {quote} 12:47:04,519 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message 12:47:06,522 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message 12:47:08,524 WARN [org.jgroups.protocols.UDP] (TQ-Bundler-4,ee,RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null)) JGRP000032: RCD_GP (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null): no physical address for RCD_NMS (flags=0), site-id=DEFAULT, rack-id=null, machine-id=null), dropping message {quote} After some debugging, I discovered that the reason is because UNICAST3 is still trying to retransmit to the dead node. Its send_table still contains an entry for the dead node with state OPEN. After looking at the source code for UNICAST3, I have a theory about what's happening. * When a node leaves the cluster, down(Event) gets invoked with a view change, which calls closeConnection(Address) for each node that left. That sets the connection state to CLOSING. * Suppose that immediately after the view change is handled, a message with the dead node as its destination gets passed to down(Message). That invokes getSenderEntry(Address), which finds the connection... and sets the state back to OPEN. Consequently, the connection is never closed or removed from the table, so retransmit attempts continue forever even though they will never succeed. This issue is easily reproducible for me, although unfortunately I can't give you the application in question. But if you have fixes you want to try, I'm happy to drop in a patched JAR and see if the issue still happens. This is my JGroups subsystem configuration: {code:xml} <subsystem xmlns="urn:jboss:domain:jgroups:6.0"> <channels default="ee"> <channel name="ee" stack="main"> <fork name="shared-dispatcher"/> <fork name="group-topology"/> </channel> </channels> <stacks> <stack name="main"> <transport type="UDP" socket-binding="jgroups" site="${gp.site:DEFAULT}"/> <protocol type="PING"/> <protocol type="MERGE3"> <property name="min_interval"> 1000 </property> <property name="max_interval"> 5000 </property> </protocol> <protocol type="FD_SOCK"/> <protocol type="FD_ALL2"> <property name="interval"> 3000 </property> <property name="timeout"> 8000 </property> </protocol> <protocol type="VERIFY_SUSPECT"/> <protocol type="pbcast.NAKACK2"/> <protocol type="UNICAST3"/> <protocol type="pbcast.STABLE"/> <protocol type="pbcast.GMS"> <property name="join_timeout"> 100 </property> </protocol> <protocol type="UFC"/> <protocol type="MFC"/> <protocol type="FRAG3"/> </stack> </stacks> </subsystem> {code} -- This message was sent by Atlassian JIRA (v7.5.0#75005)

8 years, 1 month

1
0
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

jboss-jira March 2018