[Red Hat JIRA] (JGRP-2520) CENTRAL_LOCK2: locks not released on kill
by Bela Ban (Jira)
Bela Ban created JGRP-2520:
------------------------------
Summary: CENTRAL_LOCK2: locks not released on kill
Key: JGRP-2520
URL: https://issues.redhat.com/browse/JGRP-2520
Project: JGroups
Issue Type: Bug
Reporter: Bela Ban
Assignee: Bela Ban
Fix For: 5.2, 4.2.11
2 emails from D. White:
When a node thread is killed the JChannel/LockService still remains active because the node JVM is not killed. A new worker thread is created to replace the thread that was killed. In this case, the cluster view has not changed and therefore the locks remain.
When the node JVM process is killed, that action triggers a cluster view change which is received by the Coordinator. In this case, the server lock state is rebuilt and the locks are released.
I think the following will help:
Setup: 3 node cluster, each node with two worker threads. Each set of worker threads has access to the parent node JChannel.
dwhite-jgroups-node1 (Coordinator)
dwhite-jgroups-node2
dwhite-jgroups-node3
dwhite-jgroups-node2 thread1 acquires lock on resource ENV:ISA_IEA:1
dwhite-jgroups-node2 thread1 acquires lock on resource ENV:GS_GE:1
dwhite-jgroups-node2 thread2 requests lock on resource ENV:ISA_IEA:1
dwhite-jgroups-node1 thread1 requests lock on resource ENV:ISA_IEA:1
dwhite-jgroups-node1 thread2 requests lock on resource ENV:ISA_IEA:1
dwhite-jgroups-node3 thread1 requests lock on resource ENV:ISA_IEA:1
dwhite-jgroups-node3 thread2 requests lock on resource ENV:ISA_IEA:1
Scenario #1:
dwhite-jgroups-node2 thread1 runs too long, does not respond to soft shutdown, and the node JVM process killed by watch dog service.
[SPEChannelAdapter] viewAccepted received by Coordinator dwhite-jgroups-node1.
Both locks are released.
Scenario #2:
dwhite-jgroups-node2 thread1 runs too long, and the soft shutdown kills thread1 leaving the server locks in place and the node2 JVM process running.
Watch dog detects locks held too long for ENV:ISA_IEA:1 and ENV:GS_GE:1, and issues RELEASE_LOCK messages from the Coordinator with the proper Owner.
ENV:GS_GE:1 is released.
ENV:ISA_IEA:1 remains locked, seemingly due to the presence of a GRANT_LOCK request from dwhite-jgroups-node2 thread2.
Scenario #3 (slight variation on #2):
dwhite-jgroups-node2 thread1 runs too long, and the soft shutdown kills thread1 leaving the server locks in place and the node2 JVM process running.
Watch dog detects lock held too long on ENV:ISA_IEA:1 and ENV:GS_GE:1 and issues RELEASE_LOCK from the Coordinator with proper Owner.
Watch dog also removes GRANT_LOCK request for ENV:ISA_IEA:1 from dwhite-jgroups-node2 thread2.
Now both locks are released.
The presence of GRANT_LOCK requests from node1 and node3 does not prevent the release of the lock for ENV:ISA_IEA:1 held by node2.
Email 2:
Yes, we acquire a lock within a try/catch block and release with finally.
In production, each JVM has two worker threads. If any of the threads runs too a long, a monitor task force kills the JVM process. If there are acquired locks they do not get released from the unlock call in the finally block. Usually a JVM is killed because a bad customer map runs too long and the other thread with acquired locks becomes "collateral damage". Not every business scenario uses locks. Therefore, the "orphan lock" scenario doesn't happen every time a JVM process is killed. Also, both threads are not always active.
We use the CENTRAL_LOCK2 protocol. For some reason the locks acquired from the killed process may remain in the server locks table. On occasion, the existing Coordinator doesn't detect the "orphan" locks and revoke them.
Does a view change where the Coordinator has not changed cause that Coordinator to rebuild the lock state? In a view change where the Coordinator does change, that seems to fix the problem because the new Coordinator rebuilds the lock state table.
In the case where a new Coordinator is assigned, do the state transfer protocols need to be in the configuration (e.g. BARRIER, pbcast.STATE_TRANSFER) in order for the new Coordinator to correctly re-establish the lock state? I don't think so because CENTRAL_LOCK2 does not use state-transfer; the Coordinator rebuilds the lock state.
To alleviate this problem, we have a lock monitor thread which runs on the Coordinator node and keeps track of how long each lock has been held. Since no flow can run more than an hour any lock held for more is definitely an orphan. The lock monitor task issues RELEASE_LOCK requests using the owner address of the orphan lock. The RELEASE_LOCK message works in all cases except where there are pending GRANT_LOCK requests in the queue from the same owner address of the held lock. If the GRANT_LOCK requests are from other addresses, the RELEASE_LOCK request works.
In order to simulate the problem, a test application ignores the unlock operation in the finally block purposefully creating the "orphan" in the server locks table. Other instances of the test application are running with normal lock/unlock operations. The lock monitor thread on the Coordinator subsequently detects the "lock held too long" orphan condition and issues the RELEASE_LOCK request on behalf of the orphan lock owner. Whenever a lock is successfully acquired, the lock monitor task internally keeps track of the acquired timestamp, owner, and lock ID.
I'd love to get rid of the complex lock monitor and ensure lock revoke operations are initiated by the Coordinator via the CENTRAL_LOCK2 protocol.
Another enhancement that would completely solve this problem: Allow a timeout to be specified for holding a lock. The JGroups protocol would then revoke the lock if the timeout threshold were reached.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 5 months
[Red Hat JIRA] (WFLY-14148) Move microprofile-opentracing from wildfly-ee to wildfly
by Emmanuel Hugonnet (Jira)
[ https://issues.redhat.com/browse/WFLY-14148?page=com.atlassian.jira.plugi... ]
Emmanuel Hugonnet moved EAP7-1589 to WFLY-14148:
------------------------------------------------
Project: WildFly (was: JBoss EAP Planning for version 7 and above)
Key: WFLY-14148 (was: EAP7-1589)
Issue Type: Feature Request (was: Requirement)
Workflow: GIT Pull Request workflow (was: EAP Agile Workflow 2.0)
Component/s: MP OpenTracing
(was: MicroProfile)
EAP PT Feature Implementation (FI): (was: TODO)
Fix Version/s: 22.0.0.Beta1
(was: 7.4.0.Beta)
EAP PT Pre-Checked (PC): (was: TODO)
Target Release: (was: 7.4.0.GA)
EAP PT Community Docs (CD): (was: TODO)
EAP PT Product Docs (PD): (was: New)
EAP PT Test Dev (TD): (was: TODO)
EAP PT Docs Analysis (DA): (was: TODO)
EAP PT Test Plan (TP): (was: TODO)
EAP PT Analysis Document (AD): (was: TODO)
> Move microprofile-opentracing from wildfly-ee to wildfly
> --------------------------------------------------------
>
> Key: WFLY-14148
> URL: https://issues.redhat.com/browse/WFLY-14148
> Project: WildFly
> Issue Type: Feature Request
> Components: MP OpenTracing
> Reporter: Emmanuel Hugonnet
> Assignee: Emmanuel Hugonnet
> Priority: Blocker
> Labels: EAP-CD22, observability
> Fix For: 22.0.0.Beta1
>
>
> Move MP Opentracing subsystem from wildfly-ee feature pack to wildfly feature pack
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 5 months
[Red Hat JIRA] (DROOLS-5779) Rule evaluation optimization : alpha node between range index
by Kris Verlaenen (Jira)
[ https://issues.redhat.com/browse/DROOLS-5779?page=com.atlassian.jira.plug... ]
Kris Verlaenen updated DROOLS-5779:
-----------------------------------
Sprint: 2020 Week 46-48 (from Nov 9), 2020 Week 49-51 (from Nov 30) (was: 2020 Week 46-48 (from Nov 9))
> Rule evaluation optimization : alpha node between range index
> -------------------------------------------------------------
>
> Key: DROOLS-5779
> URL: https://issues.redhat.com/browse/DROOLS-5779
> Project: Drools
> Issue Type: Story
> Components: core engine
> Affects Versions: 7.45.0.Final
> Reporter: Toshiya Kobayashi
> Assignee: Toshiya Kobayashi
> Priority: Major
>
> Motivation: This Story aims at alpha node between range indexing which improves performance of between comparison constraints (e.g. age >= 20 && < 30).
> Goals: Better rule execution performance. For each improvement, add benchmark and measure the performance gain.
> Impact: Better rule execution performance. It might have overhead during rule build time. Will confirm with benchmark anyway.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 5 months
[Red Hat JIRA] (DROOLS-5282) Client side FEEL
by Kris Verlaenen (Jira)
[ https://issues.redhat.com/browse/DROOLS-5282?page=com.atlassian.jira.plug... ]
Kris Verlaenen updated DROOLS-5282:
-----------------------------------
Sprint: 2020 Week 16-18 (from Apr 13), 2020 Week 19-21 (from May 4), 2020 Week 22-24 (from May 25), 2020 Week 25-27 (from Jun 15), 2020 Week 28-30 (from Jul 6), 2020 Week 31-33 (from Jul 27), 2020 Week 34-36 (from Aug 17), 2020 Week 37-39 (from Sep 7), 2020 Week 40-42 (from Sep 28), 2020 Week 43-45 (from Okt 19), 2020 Week 46-48 (from Nov 9), 2020 Week 49-51 (from Nov 30) (was: 2020 Week 16-18 (from Apr 13), 2020 Week 19-21 (from May 4), 2020 Week 22-24 (from May 25), 2020 Week 25-27 (from Jun 15), 2020 Week 28-30 (from Jul 6), 2020 Week 31-33 (from Jul 27), 2020 Week 34-36 (from Aug 17), 2020 Week 37-39 (from Sep 7), 2020 Week 40-42 (from Sep 28), 2020 Week 43-45 (from Okt 19), 2020 Week 46-48 (from Nov 9))
> Client side FEEL
> ----------------
>
> Key: DROOLS-5282
> URL: https://issues.redhat.com/browse/DROOLS-5282
> Project: Drools
> Issue Type: Epic
> Components: DMN Editor
> Reporter: Toni Rikkola
> Assignee: Toni Rikkola
> Priority: Major
> Labels: drools-tools
>
> Make kie-dmn-feel usable in the client side.
> Notes:
> * FEEL functions use reflection, but many if not all of them can be hard coded
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 5 months
[Red Hat JIRA] (WFLY-14147) Near-cache should be disabled when max-active-sessions is 0
by Paul Ferraro (Jira)
Paul Ferraro created WFLY-14147:
-----------------------------------
Summary: Near-cache should be disabled when max-active-sessions is 0
Key: WFLY-14147
URL: https://issues.redhat.com/browse/WFLY-14147
Project: WildFly
Issue Type: Bug
Components: Clustering
Affects Versions: 22.0.0.Alpha1
Reporter: Paul Ferraro
Assignee: Paul Ferraro
When a distributed web application uses <max-active-sessions>0</max-active-sessions> no sessions should be retained in memory on the local node, thus the near-cache of the RemoteCache for this application should be disabled.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
5 years, 5 months