[JBoss JIRA] (DROOLS-1059) Drools can't find rules under stress
by Sebastian Schaer (JIRA)
[ https://issues.jboss.org/browse/DROOLS-1059?page=com.atlassian.jira.plugi... ]
Sebastian Schaer commented on DROOLS-1059:
------------------------------------------
Hi,
I'm experiencing the same issue as well under a stress test situation:
Seemingly random failures for queries (but they all exist).
We typically run only about 10-20 rules though.
I will try to add more information or work on a reproducer (if I find the time).
Just my observations and an example stacktrace of the error we encounter:
{code}
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | [m java.lang.RuntimeException: Unable to find query 'rule_2f6dc65eb814487eb37acde7c99c2a16_query'
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.phreak.SegmentUtilities.getQueryLiaNode(SegmentUtilities.java:518) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.phreak.SegmentUtilities.getQuerySegmentMemory(SegmentUtilities.java:208) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.reteoo.SegmentMemory$QueryMemoryPrototype.populateMemory(SegmentMemory.java:505) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.reteoo.SegmentMemory$Prototype.newSegmentMemory(SegmentMemory.java:400) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.impl.KnowledgeBaseImpl.createSegmentFromPrototype(KnowledgeBaseImpl.java:1424) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.phreak.SegmentUtilities.restoreSegmentFromPrototype(SegmentUtilities.java:186) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.387 | at org.drools.core.phreak.SegmentUtilities.createSegmentMemory(SegmentUtilities.java:83) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.phreak.SegmentUtilities.createChildSegment(SegmentUtilities.java:321) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.phreak.SegmentUtilities.createChildSegments(SegmentUtilities.java:313) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.reteoo.LeftInputAdapterNode.doInsertObject(LeftInputAdapterNode.java:186) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:170) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:60) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:366) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.reteoo.ObjectTypeNode.propagateAssert(ObjectTypeNode.java:298) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.phreak.PropagationEntry$Insert.execute(PropagationEntry.java:93) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.phreak.SynchronizedPropagationList.flush(SynchronizedPropagationList.java:96) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.phreak.SynchronizedPropagationList.flush(SynchronizedPropagationList.java:69) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.impl.StatefulKnowledgeSessionImpl.flushPropagations(StatefulKnowledgeSessionImpl.java:1993) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1289) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.impl.StatefulKnowledgeSessionImpl.internalFireAllRules(StatefulKnowledgeSessionImpl.java:1294) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1281) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.413 | at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1270) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.414 | at org.drools.core.command.runtime.rule.FireAllRulesCommand.execute(FireAllRulesCommand.java:107) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.414 | at org.drools.core.command.runtime.rule.FireAllRulesCommand.execute(FireAllRulesCommand.java:34) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.414 | at org.drools.core.command.runtime.BatchExecutionCommandImpl.execute(BatchExecutionCommandImpl.java:153) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.414 | at org.drools.core.command.runtime.BatchExecutionCommandImpl.execute(BatchExecutionCommandImpl.java:69) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
INFO | jvm 1 | main | 2016/03/25 18:34:14.414 | at org.drools.core.impl.StatelessKnowledgeSessionImpl.execute(StatelessKnowledgeSessionImpl.java:292) ~[drools-core-6.3.0.Final.jar:6.3.0.Final]
{code}
Unlike [~massinissa] we use stateless sessions and the exception happens when we execute the FireAllRulesCommand.
I also implemented a very simple retry logic (limited to 5 retries) around this specific exception (catching this exception, and reusing the existing stateless session to execute the same command again) and so far I have not encountered any case where the exception gets thrown a second time. And from all I can tell the second time the FireAllRulesCommand is executed, the rule evaluation seems ok (we create/insert facts and all of that seems to be working fine).
(So my preliminary workaround is to just re-execute the same command on the same session and it seems ok so far. This of course might not work with stateful sessions).
> Drools can't find rules under stress
> ------------------------------------
>
> Key: DROOLS-1059
> URL: https://issues.jboss.org/browse/DROOLS-1059
> Project: Drools
> Issue Type: Bug
> Components: core engine
> Affects Versions: 6.3.0.Final
> Reporter: Massinissa BOUZIAD
> Assignee: Mario Fusco
> Priority: Blocker
> Attachments: DroolsBugReproducerTest.java, reproducerRule.drl
>
>
> In my knowledge base, I have many rules.
> All of them are working very well in production with drools 6.0.1-FINAL even in stress condition hight trafic (arount 40 hits seconds)
> This bug append when we made an upgrade with drools 6.3.0-FINAL which is compatible with jdk8 mandatory in my case.
> So now when I put my rules under stress test (benchmarking) I got this random error.
> Drools is unable to find a query (not always the same one).
> I got this error for 0,6% of my requests.
> *+Following the stack trace : +*
> Unable to find query 'checkAndBindBasket'
> at org.drools.core.phreak.SegmentUtilities.getQueryLiaNode(SegmentUtilities.java:518) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.SegmentUtilities.getQuerySegmentMemory(SegmentUtilities.java:208) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.SegmentMemory$QueryMemoryPrototype.populateMemory(SegmentMemory.java:505) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.SegmentMemory$Prototype.newSegmentMemory(SegmentMemory.java:400) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.impl.KnowledgeBaseImpl.createSegmentFromPrototype(KnowledgeBaseImpl.java:1424) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.SegmentUtilities.restoreSegmentFromPrototype(SegmentUtilities.java:186) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.SegmentUtilities.createSegmentMemory(SegmentUtilities.java:83) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.LeftInputAdapterNode.assertObject(LeftInputAdapterNode.java:167) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:60) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:494) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:384) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.SingleObjectSinkAdapter.propagateAssertObject(SingleObjectSinkAdapter.java:60) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:494) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:384) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:494) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:384) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:494) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:384) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.AlphaNode.assertObject(AlphaNode.java:145) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.doPropagateAssertObject(CompositeObjectSinkAdapter.java:494) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.CompositeObjectSinkAdapter.propagateAssertObject(CompositeObjectSinkAdapter.java:384) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.reteoo.ObjectTypeNode.propagateAssert(ObjectTypeNode.java:298) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.PropagationEntry$Insert.execute(PropagationEntry.java:93) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.SynchronizedPropagationList.flush(SynchronizedPropagationList.java:96) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.phreak.SynchronizedPropagationList.flush(SynchronizedPropagationList.java:69) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.impl.StatefulKnowledgeSessionImpl.flushPropagations(StatefulKnowledgeSessionImpl.java:1993) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1289) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.impl.StatefulKnowledgeSessionImpl.internalFireAllRules(StatefulKnowledgeSessionImpl.java:1294) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1281) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
> at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1260) ~[darty-pricing-ws-2.0.2.jar:2.0.2]
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFCORE-1458) add-user.sh scripts change layout of properties files
by Chao Wang (JIRA)
[ https://issues.jboss.org/browse/WFCORE-1458?page=com.atlassian.jira.plugi... ]
Chao Wang moved JBEAP-4008 to WFCORE-1458:
------------------------------------------
Project: WildFly Core (was: JBoss Enterprise Application Platform)
Key: WFCORE-1458 (was: JBEAP-4008)
Workflow: GIT Pull Request workflow (was: CDW with loose statuses v1)
Component/s: Scripts
(was: Scripts)
Target Release: (was: 7.backlog.GA)
Affects Version/s: 2.1.0.Final
(was: 7.0.0.ER7)
> add-user.sh scripts change layout of properties files
> -----------------------------------------------------
>
> Key: WFCORE-1458
> URL: https://issues.jboss.org/browse/WFCORE-1458
> Project: WildFly Core
> Issue Type: Bug
> Components: Scripts
> Affects Versions: 2.1.0.Final
> Reporter: Chao Wang
> Assignee: Chao Wang
> Priority: Minor
>
> *Description of problem:*
> add-user.sh scripts change layout of properties files:
> * application-users.properties
> * mgmt-users.properties
> *How reproducible:*
> Always
> *Steps to Reproduce:*
> # get fresh EAP
> # ./add-user.sh -u admin2 -p Admin1** -s
> # ./add-user.sh -a -u admin3 -p Admin1** -s
> *Actual results:*
> These comments are moved to the end of application-users.properties file:
> {noformat}
> #
> #$REALM_NAME=ApplicationRealm$ This line is used by the add-user utility to identify the realm name already used in this file.
> #
> {noformat}
> These comments are moved to the end of mgmt-users.properties file:
> {noformat}
> #
> #$REALM_NAME=ManagementRealm$ This line is used by the add-user utility to identify the realm name already used in this file.
> #
> {noformat}
> *Expected results:*
> Comments have same order after execution of add-user.sh
> CC: [~ctomc]
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFLY-1066) Automatic configuration of 'Initial_hosts' for a cluster using JGroups TCP-stack in domain mode (aka DOMAIN_PING)
by Sebastian Łaskawiec (JIRA)
[ https://issues.jboss.org/browse/WFLY-1066?page=com.atlassian.jira.plugin.... ]
Sebastian Łaskawiec updated WFLY-1066:
--------------------------------------
Priority: Minor (was: Major)
> Automatic configuration of 'Initial_hosts' for a cluster using JGroups TCP-stack in domain mode (aka DOMAIN_PING)
> -----------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-1066
> URL: https://issues.jboss.org/browse/WFLY-1066
> Project: WildFly
> Issue Type: Feature Request
> Components: Domain Management
> Environment: Server running in Domain mode
> Reporter: Wolf-Dieter Fink
> Assignee: Sebastian Łaskawiec
> Priority: Minor
> Labels: clustering, domain, jgroups
>
> It is complicated to keep the subsystem JGroups in sync if the tcp-stack is used in domain mode.
> All new servers that join/leave a clustered server group (configuration) must be added or removed by hand for the jgroup configuration.
> The domain server will receive the information if a host-controller enrol and register server to a clustered server-group.
> So the configuration of the initial_hosts can be done automatically to avoid old entries which cause unnecessary checks and ensure that all active servers are known.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFCORE-1457) Embedded HC started with host-slave.xml uses incorrect hostname
by Brian Stansberry (JIRA)
Brian Stansberry created WFCORE-1457:
----------------------------------------
Summary: Embedded HC started with host-slave.xml uses incorrect hostname
Key: WFCORE-1457
URL: https://issues.jboss.org/browse/WFCORE-1457
Project: WildFly Core
Issue Type: Bug
Reporter: Ken Wills
Assignee: Ken Wills
Fix For: 3.0.0.Alpha1
EmbeddedHostControllerFactory:282, authCode is being used as the hostname. This is reset if the name is present in the xml config, but since the default -slave.xml has no name, the hostname is displayed as the contents of authCode.
We should probably extract the hostname code from HostControllerEnvironment:300 into its own method, and use that.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFCORE-1456) org.jboss.as.cli.scriptsupport.CLI prevents use of embed-server and embed-host-controller
by Brian Stansberry (JIRA)
Brian Stansberry created WFCORE-1456:
----------------------------------------
Summary: org.jboss.as.cli.scriptsupport.CLI prevents use of embed-server and embed-host-controller
Key: WFCORE-1456
URL: https://issues.jboss.org/browse/WFCORE-1456
Project: WildFly Core
Issue Type: Bug
Components: CLI
Affects Versions: 2.1.0.Final
Reporter: Brian Stansberry
Assignee: Alexey Loubyansky
The org.jboss.as.cli.scriptsupport.CLI will throw an exception if the user invokes the cmd method before invoking one of the connect methods. But the connect/disconnect methods are oriented toward remote connections. The result is you can't really use this class with the offline CLI.
I'm not sure what the best fix is here. Perhaps new method(s) analogous to connect/disconnect that establish a CommandContext but don't call connectController(). Or perhaps the checkNotConnected() call in the cmd method should just be dropped, and an unconnected command context created. But then the existing connect/disconnect methods would need to be reworked to deal with the fact that an unconnected command context may be present.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFLY-6413) Range headers do not seem to be handled correctly and prevents video delivery in Chrome and Safari
by Stuart Douglas (JIRA)
[ https://issues.jboss.org/browse/WFLY-6413?page=com.atlassian.jira.plugin.... ]
Stuart Douglas commented on WFLY-6413:
--------------------------------------
This is fixed in Undertow 1.3.20.Final which was released yesterday.
> Range headers do not seem to be handled correctly and prevents video delivery in Chrome and Safari
> --------------------------------------------------------------------------------------------------
>
> Key: WFLY-6413
> URL: https://issues.jboss.org/browse/WFLY-6413
> Project: WildFly
> Issue Type: Bug
> Components: Web (Undertow)
> Affects Versions: 10.0.0.Final
> Environment: WildFly 10.0.0.Final
> Windows 7 or Mac 10.11.3
> Java 8
> Chrome, Firefox, Safari
> Reporter: Jason Holmberg
> Assignee: Stuart Douglas
> Priority: Blocker
>
> Safari on iOS requires range headers to be able to play video content via HTML5. So enabling range headers in WildFly should make this happen. It does not. Enabling the range headers actually prevent Chrome from playing the video content, which previously worked when the range headers were NOT enabled.
> After enabling range headers as described here:
> https://developer.jboss.org/message/953058#953058
> I made some range requests via `curl` to see what is being returned:
> This is the result of a request to *WildFly* with the Range headers enabled:
> {noformat}
> $ curl -I --range 0- http://localhost:8880/vidtest/vidtest.mp4
> HTTP/1.1 200 OK
> Connection: keep-alive
> Last-Modified: Thu, 17 Mar 2016 19:15:42 GMT
> X-Powered-By: Undertow/1
> Server: WildFly/10
> Content-Type: video/mp4
> Content-Length: 8200890
> Date: Fri, 18 Mar 2016 16:59:55 GMT
> {noformat}
> This is the result of a request to the same content being served from Tomcat 8, no special config required. *All the browsers can play the content when served from Tomcat 8*:
> {noformat}
> $ curl -I --range 0- http://localhost:8080/vidtest/vidtest.mp4
> HTTP/1.1 206 Partial Content
> Server: Apache-Coyote/1.1
> Accept-Ranges: bytes
> ETag: W/"8200890-1458232627000"
> Last-Modified: Thu, 17 Mar 2016 16:37:07 GMT
> Content-Range: bytes 0-8200889/8200890
> Content-Type: video/mp4
> Content-Length: 8200890
> Date: Fri, 18 Mar 2016 17:00:08 GMT
> {noformat}
> I have created a small project that I have been using to trouble shoot this issue: https://github.com/slowtrailrunner/html5-vidtest
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2028) GMS sometimes ignores view bundling timeout
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2028?page=com.atlassian.jira.plugin.... ]
Bela Ban closed JGRP-2028.
--------------------------
> GMS sometimes ignores view bundling timeout
> -------------------------------------------
>
> Key: JGRP-2028
> URL: https://issues.jboss.org/browse/JGRP-2028
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.8
> Reporter: Dan Berindei
> Assignee: Bela Ban
> Priority: Minor
> Fix For: 3.6.9, 4.0
>
>
> {{GMS.ViewHandler.run()}} has this code:
> {code}
> wait_time=timeout - (System.nanoTime() - start_time);
> if(wait_time > 0 && firstRequest.canBeProcessedTogether(firstRequest)) { // JGRP-1438
> long wait_time_ms=TimeUnit.MILLISECONDS.convert(wait_time, TimeUnit.NANOSECONDS);
> queue.waitUntilClosed(wait_time_ms); // misnomer: waits until element has been added or q closed
> }
> {code}
> The problem is {{wait_time_ms}} will be {{0}} if {{0 < wait_time < 1_000_000}}. {{queue.waitUntilClosed(wait_time_ms)}} then calls {{Object.wait(0)}} and blocks forever.
> Fortunately, the joiners re-sends the {{JOIN_REQ}} message after {{GMS.join_timeout}} (3s by default), so all that happens is that the view is delayed by 3s. It does cause some random failures in tests that expect the view to form in a precise amount of time, though.
> {noformat}
> 17:04:53,034 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
> 17:04:53,034 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,035 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
> 17:04:53,035 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
> 17:04:53,060 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeF-49399: #1, conn_id=0, first)
> 17:04:53,060 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeG-49307: #1, conn_id=0, first)
> 17:04:53,061 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#1
> 17:04:53,061 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#1
> 17:04:56,035 WARN (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: JOIN(NodeF-49399) sent to NodeE-37644 timed out (after 3000 ms), on try 1
> 17:04:56,035 WARN (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: JOIN(NodeG-49307) sent to NodeE-37644 timed out (after 3000 ms), on try 1
> 17:04:56,036 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
> 17:04:56,036 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,036 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
> 17:04:56,036 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
> 17:04:56,057 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#2
> 17:04:56,057 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#2
> # finally
> 17:04:56,057 TRACE (ViewHandler,NodeE-37644:) [GMS] NodeE-37644: joiners=[NodeG-49307, NodeF-49399], suspected=[], leaving=[], new view: [NodeE-37644|1] (3) [NodeE-37644, NodeG-49307, NodeF-49399]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months